Driven User Guide

version 2.1.4-eap-9

Counter Data and Other Metrics in Tables

Tables map a lot of detailed information to all areas of application performance on Hadoop clusters. Tables show some basic information, which are listed as Common metrics in the column chooser. For example, a table on an Application View page provides application name and status in each row. You can also show or hide Time, Duration, and Tags (if applicable) metrics by using the column chooser.

Driven can provide deeper insights in the tables if you choose to display data from counters. Counters feed big data application execution metrics to the Driven Plugin from Hadoop, Cascading, and other frameworks. Standard counters are listed under the Counter Aliases heading in the column chooser. In addition, if custom counters are programmed in the application, Driven also surfaces these counters in the column chooser as importable attributes to your tables.

Raw counter data is usually about a very granular level of an application run. For example, a counter could exist to report on the number of user profiles that are running an application. But the counter might be reporting metrics from separate steps of application instances one-by-one. You receive accurate data about the number of user profiles running the application at step level, but you do not get a total number of user profiles running applications cumulatively. The problem with receiving fragmented data like this is that you have an incomplete picture of the application. In addition, Hadoop holds on to the counter data for a short period of time. Without a storage mechanism, people and systems do not have one-stop access to historical data. The best workaround is to check logs, which is unwieldy and time-consuming.

A value of Driven is that it aggregates all the data reported from a counter into an application level, while also reporting on the unit of work and step level if the user wants to stay on the granular level. Driven maintains the historical data and makes it easily, dynamically retrievable. Driven allows you to filter so that you can slice and dice what you see from the counter data. On a higher level, you can easily compare counter data from one application to another application.

Note	For information about importing counters to tables in the slice performance view, see Understanding the Unit of Work Details Page.

To select or remove metrics in the displayed table:

Click the Select table columns icon.
Indicate which metrics to display in the Details Table by selecting and de-selecting columns. You might need to expand counter group headings to display particular attributes that you want to include or exclude. You might also need to minimize a group of metric attributes to more easily view the full list. See Figure 1 for examples.
Click UPDATE.

Figure 1. Column chooser window example

The following sections provide a brief explanation of the column metrics:

Common

The common metrics vary depending on whether you are viewing a table listing application runs or units of work.

Effective Parallelism - Total runtime of all slices divided by runtime of the app, averaged over all app instances.
Host Name - The name of the machine where the application jar was launched.
ID - The numeric identifier (ID) of the application.
IP Address - The IP address of the host where the application ran.
JVM Max Memory - The JVM Max Memory settings where the application ran.
Owner - The owner (by name) of the application.
PID - The PID of the process where the application ran.
Slide Rate - A small line graph that plots the changing number of slices involved in an active application at different points over a short period of time. The slice rate is depicted by the vertical fluctuations of the line. This column attribute also displays a timestamped count of slices for the last point in time range for the slice rate line graph.
Team - The team name that is associated with the application. The Owner of the team or appointed Team Leader is able to manage the team member details through a link in this column.
UoW Active Count - The currently active OuW (units of work) in the application.
UoW Count - The total number of UoW in the application.
UoW Failed Count - The number of failed OuW in the application.
Version - The version of the application as set by the developer.
Timeline - A timeline representation of each row with color coding to reflect the status. Hover over the graphed bar to display a pop-up of the status information.

Tip	Slice Rate is displayed only for an application instance that is still executing as of the time range endpoint for the displayed data. If you have Auto Update turned ON and an application is executing, the slice rate refreshes itself in real time until the app reaches a Finished state.

Time

Pending Time - The point in time when an application JVM is started up or a unit of work is created.
Start Time - The point in time when the first unit of work was requested to begin executing on the cluster for an application, or when the unit of work was requested to begin execution.
Submit Time - This is when the unit of work is actually submitted to the cluster for execution.
Run Time - The point in time when the first unit of work begins executing on the cluster for an application, or when the current unit of work began execution.
Finished Time - The point in time when the application JVM or unit of work completes or fails.

Counter Aliases

Many Apache Hadoop ecosystem projects, like Hadoop MapReduce, Apache YARN, and Apache Tez, provide counter names that are slightly different from other platforms. In many cases, the counter "key name" may be the same across the platforms, but the "group name" may live under a different name making comparisons across platforms difficult and tedious.

Counter aliases provide a shorthand for getting equivalent counter metrics across platforms into a single table column for easier comparison.

Bytes Read - The total number of bytes read during the execution of the application across all file systems — an alias for *.FileSystemCounter.*_BYTES_READ counters.
Bytes Written - The number of bytes written during the execution of the application across all file systems — an alias for */FileSystemCounter.*_BYTES_WRITTEN counters.
CPU Time - The total amount of CPU time (in milliseconds) across all slices for the application or unit of work being displayed — an alias for *.CPU_MILLISECONDS counters.
Records Read - The total number of records read during the execution of the application or unit of work — an alias for *.MAP_INPUT_RECORDS and *.REDUCE_INPUT_RECORDS.
Records Written - The total number of records written during the execution of the application or unit of work — an alias for *.MAP_OUTPUT_RECORDS, *.REDUCE_OUTPUT_RECORDS, or for Tez *.OUTPUT_RECORDS
Task Retries - The number of retries that Hadoop executed for a task, when known.

Additionally, many counters are framework specific. A few examples of counters for the Cascading framework are listed below.

CoGroup Tuples Spilled - The number of Tuples spilled to disk during the Cascading CoGroup operation — this metric can help determine if the spill threshold should be increased or decreased.
HashJoin Tuples Spilled - The number of Tuples spilled to disk during the Cascading HashJoin operation — this metric can help determine if the spill threshold should be increased or decreased.
Tuples Read - The number of Tuples read during the execution of the Cascading application or unit of work.
Tuples Trapped - The number of trapped tuples (data placed into an output file because of error in processing) during the execution of the Cascading application or unit of work.
Tuples Written - The number of Tuples written during the execution of the Cascading application or unit of work.

Duration

The Duration metric is the product of measured time during a particular state and the matrix of ensuing states when processing the application.

Duration - The time when the application is in the Started status subtracted from the time when the application reaches a finished state.
Pending:Run Duration - The time when the application is in the Pending status subtracted from the time when the application is in the Running status.
Pending Duration - The total time that the application is in Pending status.
Pending:Submit Duration - The time when the application is in the Pending status subtracted from the time when the application is in the Submitted status.
Running Duration - The total time that the application is in the Running status.
Start:Finish Duration - The time when the application is in the Started status subtracted from the time when the application reaches a finished state.
Start:Run Duration - The time when the application is in the Started status subtracted from the time when the application is in the Running status.
Started Duration - The total time that the application is in the Started status.
Submitted:Finish Duration - The time when the application is in the Submitted status subtracted from the time when the application reaches a finished state.
Submitted Duration - The total time that the application is in the Submitted status.
Total Duration - The time when the application is in the Pending status subtracted from the time when the application reaches a finished state.

Tip	See the Driven application states documentation for more information about semantics for the counters.

Understanding the App Details Page