Application Not Responding(简称:ANR)指应用中一些特定的事件(如用户触摸事件、广播等)在应用的主线程没有在规定的时间内处理完,系统自动做出终止应用运行的响应。问题出现的原因主要是两个方面:
- 应用进程自身引起的,例如:主线程阻塞、挂起、死循环
- 应用进程的其他线程的CPU占用率高,使得主线程无法抢占到CPU时间片
常见的三种ANR类型:
- KeyDispatchTimeout(谷歌默认5s,MTK平台上是8s): 主要类型按键或触摸事件在特定时间内无响应
- BroadcastTimeout(10s): 主要是BroadcastRecevier在规定时间无法处理完成。前台广播超时时间是10s,后台广播超时是60s,这类超时没有提示框弹出。代码见AMS的
BROADCAST_FG_TIMEOUT
和BROADCAST_BG_TIMEOUT
。 - ServiceTimeout(20s): Service在规定时间内无法处理完成操作,即会报出服务超时,这类ANR同样没有提示框出现。超时时间,前台Service是20s,后台Service是200s。代码见ActivityServices的
SERVICE_TIMEOUT
和SERVICE_BACKGROUND_TIMEOUT
。
0x01 KeyDispatchingTimedOut
1.1 错误实例
先看下面的错误实例:
1 |
|
1.2 adb日志和traces分析
在子线程保存数据到文件或数据库(这里用sleep操作模拟耗时io操作),如果同时可能涉及到在主线程操作同一个锁对象的情况在,这时你是否会习惯的使用synchronized关键词保证list的同步呢?当在主线程和异步线程产生了对相同对象的竞争关系,那这时就很容易出现主线程的阻塞,而阻塞的时间长短就取决于主线程啥时候获取到竞争对象。而此时反馈在系统层面当用户操作不会得到响应,最终应用以ANR的形式退出。运行上面的错误代码你会获取类似下面的ANR日志信息Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
查看虚拟机trace文件输出目录adb shell getprop dalvik.vm.stack-trace-file
,再查看手机目录下/data/anr/traces.txt
的文件内容,会发现main线程在等待释放锁<0x0af98db2>,而这个锁正在被thread 10所持有
再查看tid=10的线程的线程状态,该线程正在sleeping,这也印证了上面实例的代码,主线程在等待一个sleep线程释放锁而导致了ANR。当然在实际项目中的日志和原因未必会这么明显,但形如实例的错误代码确实是很常见的场景。
1.3 traces.txt关键信息注
- pid为进程id,sysTid=pid,这里主线程的线程号=进程号,prio=5为线程优先级
- 当一个线程占有一个锁的时候,会打印-locked<0xxxxxxx>
- 当该线程正在等待别的线程释放该锁,会打印waiting to lock <0xxxxxx>
- 如果代码中有wait()调用的话,首先是locked,然后会打印waiting on <0xxxxxx>
1.4 ANR Input event dispatching timed out Reason
参考http://gityuan.com/2017/01/01/input-anr/的input-anr异常原因的总结,input anr主要分为以下几类。
- 无窗口, 有应用:Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.
- 窗口暂停: Waiting because the [targetType] window is paused.
- 窗口未连接: Waiting because the [targetType] window’s input channel is not registered with the input dispatcher. The window may be in the process of being removed.
- 窗口连接已死亡:Waiting because the [targetType] window’s input connection is [Connection.Status]. The window may be in the process of being removed.
- 窗口连接已满:Waiting because the [targetType] window’s input channel is full. Outbound queue length: [outboundQueue长度]. Wait queue length: [waitQueue长度].
- 按键事件,输出队列或事件等待队列不为空:Waiting to send key event because the [targetType] window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: [outboundQueue长度]. Wait queue length: [waitQueue长度].
- 非按键事件,事件等待队列不为空且头事件分发超时500ms:Waiting to send non-key event because the [targetType] window has not finished processing certain input events that were delivered to it over 500ms ago. Wait queue length: [waitQueue长度]. Wait queue head age: [等待时长].
- targetType: 取值为”focused”或者”touched”
- Connection.Status: 取值为”NORMAL”,”BROKEN”,”ZOMBIE”
所以如2.1中实例代码,当ANR发生在Activity的onCreate流程中时,你讲看到无窗口, 有应用
的日志信息,当ANR发生在对某个View的OnClickListener中时,你将从日志中获取事件等待队列不为空且头事件分发超时500ms
的信息,这样通过不同的日志信息就可大致定位ANR出现的用户场景,进而方便定位出问题代码。
参考文档
http://yuanfentiank789.github.io/2017/09/05/ANR%E5%88%86%E6%9E%90/
http://gityuan.com/2017/01/01/input-anr/
https://maoao530.github.io/2017/02/21/anr-analyse/
http://rayleeya.iteye.com/blog/1955657