ANR分析记录(一)ANR初识

Application Not Responding(简称:ANR)指应用中一些特定的事件(如用户触摸事件、广播等)在应用的主线程没有在规定的时间内处理完,系统自动做出终止应用运行的响应。问题出现的原因主要是两个方面:

  1. 应用进程自身引起的,例如:主线程阻塞、挂起、死循环
  2. 应用进程的其他线程的CPU占用率高,使得主线程无法抢占到CPU时间片

常见的三种ANR类型:

  1. KeyDispatchTimeout(谷歌默认5s,MTK平台上是8s): 主要类型按键或触摸事件在特定时间内无响应
  2. BroadcastTimeout(10s): 主要是BroadcastRecevier在规定时间无法处理完成。前台广播超时时间是10s,后台广播超时是60s,这类超时没有提示框弹出。代码见AMS的BROADCAST_FG_TIMEOUTBROADCAST_BG_TIMEOUT
  3. ServiceTimeout(20s): Service在规定时间内无法处理完成操作,即会报出服务超时,这类ANR同样没有提示框出现。超时时间,前台Service是20s,后台Service是200s。代码见ActivityServices的SERVICE_TIMEOUTSERVICE_BACKGROUND_TIMEOUT

0x01 KeyDispatchingTimedOut

1.1 错误实例

先看下面的错误实例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);

// 异步操作数组
saveListToDb();

traverseList();
}

public void traverseList() {
Handler handler = new Handler();
handler.post(new Runnable(){
synchronized (mList) {
// todo something
}
});
}

public void saveListToDb() {
new Thread(new Runnable() {
@Override
public void run() {
synchronized (mList) {
try {
// todo save list
Thread.sleep(50000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}).start();
}

1.2 adb日志和traces分析

在子线程保存数据到文件或数据库(这里用sleep操作模拟耗时io操作),如果同时可能涉及到在主线程操作同一个锁对象的情况在,这时你是否会习惯的使用synchronized关键词保证list的同步呢?当在主线程和异步线程产生了对相同对象的竞争关系,那这时就很容易出现主线程的阻塞,而阻塞的时间长短就取决于主线程啥时候获取到竞争对象。而此时反馈在系统层面当用户操作不会得到响应,最终应用以ANR的形式退出。运行上面的错误代码你会获取类似下面的ANR日志信息
Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
查看虚拟机trace文件输出目录adb shell getprop dalvik.vm.stack-trace-file,再查看手机目录下/data/anr/traces.txt的文件内容,会发现main线程在等待释放锁<0x0af98db2>,而这个锁正在被thread 10所持有

Alt text

再查看tid=10的线程的线程状态,该线程正在sleeping,这也印证了上面实例的代码,主线程在等待一个sleep线程释放锁而导致了ANR。当然在实际项目中的日志和原因未必会这么明显,但形如实例的错误代码确实是很常见的场景。

Alt text

1.3 traces.txt关键信息注

  1. pid为进程id,sysTid=pid,这里主线程的线程号=进程号,prio=5为线程优先级
  2. 当一个线程占有一个锁的时候,会打印-locked<0xxxxxxx>
  3. 当该线程正在等待别的线程释放该锁,会打印waiting to lock <0xxxxxx>
  4. 如果代码中有wait()调用的话,首先是locked,然后会打印waiting on <0xxxxxx>

1.4 ANR Input event dispatching timed out Reason

参考http://gityuan.com/2017/01/01/input-anr/的input-anr异常原因的总结,input anr主要分为以下几类。

  1. 无窗口, 有应用:Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.
  2. 窗口暂停: Waiting because the [targetType] window is paused.
  3. 窗口未连接: Waiting because the [targetType] window’s input channel is not registered with the input dispatcher. The window may be in the process of being removed.
  4. 窗口连接已死亡:Waiting because the [targetType] window’s input connection is [Connection.Status]. The window may be in the process of being removed.
  5. 窗口连接已满:Waiting because the [targetType] window’s input channel is full. Outbound queue length: [outboundQueue长度]. Wait queue length: [waitQueue长度].
  6. 按键事件,输出队列或事件等待队列不为空:Waiting to send key event because the [targetType] window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: [outboundQueue长度]. Wait queue length: [waitQueue长度].
  7. 非按键事件,事件等待队列不为空且头事件分发超时500ms:Waiting to send non-key event because the [targetType] window has not finished processing certain input events that were delivered to it over 500ms ago. Wait queue length: [waitQueue长度]. Wait queue head age: [等待时长].
  • targetType: 取值为”focused”或者”touched”
  • Connection.Status: 取值为”NORMAL”,”BROKEN”,”ZOMBIE”

所以如2.1中实例代码,当ANR发生在Activity的onCreate流程中时,你讲看到无窗口, 有应用的日志信息,当ANR发生在对某个View的OnClickListener中时,你将从日志中获取事件等待队列不为空且头事件分发超时500ms的信息,这样通过不同的日志信息就可大致定位ANR出现的用户场景,进而方便定位出问题代码。

参考文档

http://yuanfentiank789.github.io/2017/09/05/ANR%E5%88%86%E6%9E%90/
http://gityuan.com/2017/01/01/input-anr/
https://maoao530.github.io/2017/02/21/anr-analyse/
http://rayleeya.iteye.com/blog/1955657