Root Cause Analysis on Android Log Data for Failure Investigation

Root cause analysis of Android log data enables the development team to identify underlying issues, fostering stable service creation. By comprehensively understanding failures through this analysis, the team can effectively mitigate them in subsequent updates, enhancing the reliability and performance of their applications.

Starting Point

Approximately 200 million log entries originating from various Android devices during real-world field tests.

Objective

Examine specific root causes of device failures to expedite their elemination by the development team in subsequent updates.

Added Value​

Decreasing the search area for pinpointing the root cause of each failure from 30,000 log entries to just 50.

From challenges to solutions

Volume of Data

The sheer volume of log data generated by Android devices, especially during extensive field tests or widespread deployment, can overwhelm the development team, making it challenging to identify meaningful patterns or root causes.

Ambiguity in Error Messages

Some error messages in Android log data might be cryptic or lacking in detail, making it difficult to determine the precise cause of a failure without additional context or manual investigation, slowing down the root cause analysis process.

Software Fragmentation

The diverse ecosystem of Android devices, with variations in software configurations, poses a challenge for root cause analysis. Identifying consistent patterns across different devices and software versions can be complex, particularly when certain failures are specific to particular hardware or software configurations.

Find main Services

Group various failures into clusters to identify the primary services of the Android device implicated in the root cause.

Filtering

Retaining only the informative log entries crucial for identifying the root cause by filtering out 99.9% of the log entries.

Impact on the Development Team

Decreasing the time required to identify the root cause of a failure by approximately 75% for the development team.

Technical deep dive​

Dive deep into our work on Root Cause Analysis

Progressing from Anomaly Detection to Automated Log Labeling and Pioneering Root Cause Analysis

Dive deep into our work on Reactive Anomaly Detection

PULL: reactive log anomaly detection based on iterative PU learning

Interested in this topic?

Reach out to discuss root cause analysis on log data