Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?

Can I use Neural Engine
to run my neural networks
on A11 devices?
Koan-Sin Tan

freedom@computer.org

Hsinch Coding Serfs Meeting, Nov, 2018

https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-review-unveiling-the-
silicon-secrets/5
• AnandTech is one of my favorite tech sites. Usually, they provides
good technical analysis

• E.g., Apple’s CPUs

• cache sizes

• execution units

• various instruction latency

• Not good enough for NN accelerators on mobile phones

• ﬂoating-point VGG16, Inception V3, and ResNet34?

• come on, are you still in Neolithic era?
ANE on A12, how about A11?

Why I said VGG16 is
Neolithic Era
• Lightweight models are there

• MobileNet V1 could have roughly
the same top-1 accuracy event
with quantized uint8

• MobileNet V2 could have better
top-1 accuracy

• Mnasnet could be better than
MobileNet V1

• Classiﬁcation, object detection,
segmentation, etc.

• 8-bit quantization are good enough
for many cases
https://github.com/tensorﬂow/models/raw/master/research/slim/nets/mobilenet/
madds_top1_accuracy.png

How to use Neural Engine
• According to Apple:

• A11: 600 G ops per second, A12: 5 T ops per second

• Yes, by default, it's enabled on A12 device. If you have pre-iOS 10.12 apps built on top of Core ML, they
should be able to use it automatically. But, not on A11 devices.

• How to verify it?

• MLConﬁguration [1]: instance variable

@property(readwrite) MLComputeUnits computeUnits;

• there is usesCPUOnly for VNRequest in iOS11, but not something like MLComputUnits

• See my example [2]

[1] https://developer.apple.com/documentation/coreml/mlmodelconﬁguration?language=objc

[2] https://github.com/freedomtan/coremlbenchmark/

Why not VNRequest?
• Since I mentioned VNRequest in Vision.framework, why not VNCoreMLRequest?

• Yes, I wrote simple VNCoreMLRequest based app before. Both Swift and objective-c
ones [1][2].

• Simpliﬁed interface and image crop and scale for you.

• Yes, image operations time.

• This actually reminds us an important system software issue.

• Modern cellphone SoCs use DVFS and all kinds of energy-saving techniques
extensively. How can use get good performance?

• Inference with camera on is usually faster than with camera oﬀ!!!
[1] https://github.com/freedomtan/SimpleInceptionV3/

[2] https://github.com/freedomtan/SimpleInceptionV3-ObjC

Neural Engine in Action
• H11ANESevicesThread

• A12 is for iPhone11,x

• No H10ANEServicesThread

• So, who started
H11ANEServicesThread? There is no
anything named H11 in /System/
Library/Frameworks/
CoreML.framework/CoreML

• It seems it’s in /System/Library/
PrivateFrameworks/
ANEServices.framework/
ANEServices
• A12 devices only

iPhone Xs Max
default 17:17:14.002705 +0800 kernel IOReturn H11ANEIn::ANE_ProcessDestroy_gated(H11ANEProcessDestroyArgs *, bool, uint32_t *) :
H11ANEIn::ANE_ProgramDestroy_gated WARN: Freeing intermediate buffer inside ProcessDestroy
H11ANE:ANE_ProcessDestroy_gated Removed client aned from programHandle=0x8a03aa2e112. Num clients for program=0
H11ANE:ANE_ProcessDestroy_gated Removed client aned from programHandle=0x8a02e50c71e. Num clients for program=0
default 17:17:14.024969 +0800 kernel IOReturn H11ANEInUserClient::ANE_PowerOff() - client aned requesting Power Off
default 17:17:14.025291 +0800 kernel IOReturn H11ANEIn::setPowerStateGated(unsigned long, IOService *) : H11ANEIn::setPowerStateGated: 0
default 17:17:14.026850 +0800 kernel IOReturn H11ANEIn::ANE_deInit() : H11ANEIn::ANE_deInit - CSNE_CMD_POWER_DOWN command completed:
res=0x00000000
default 17:17:14.026880 +0800 kernel IOReturn H11ANEIn::ANE_deInit() : H11ANEIn::ANE_deInit - ANECPU in WFI after CSNE_CMD_SUSPEND/
CISP_CMD_POWER_DOWN. retries=0 ASCWRAP_IDLE_STATUS = 0x2d
default 17:17:14.039520 +0800 kernel IOReturn H11ANEIn::ANE_HandlePowerStateChecksForClient() : INFO: H11ANEIn: ANE power status:
isPowered: 0, fDeInitInProgress: 0, fFirmwareTimeout: 0
default 17:17:14.039563 +0800 kernel IOReturn H11ANEIn::ANE_UserClientCleanup_gated(void *) : Info: H11ANEIn: Skipping user client
cleanup for client (<private>) as power is already off
default 17:17:14.039723 +0800 kernel virtual IOReturn H11ANEInUserClient::clientClose() - aned
default 17:17:14.039749 +0800 kernel virtual void H11ANEInUserClient::free() - Freeing UserClient for process: aned (pid 191)

iPhone 8 Plus
default 17:08:51.256253 +0800 kernel ISPCPU: CmdTurnOffDevicePower: TS: 2.901495 Disable CAM0_SHUTDOWN=0
default 17:08:51.256277 +0800 kernel ISPCPU: Addr: 0x00000002122a8000
default 17:08:51.258444 +0800 kernel ISPCPU: TurnOffPower:DONE TS: 2.903766 rail: 0x5, ch: 0, cameraPowerBitEnable:
0x7e
default 17:08:51.258684 +0800 kernel AppleH10CamIn::ISP_PPMAdmissionCheck_gated: subClientID=1; budgetReq=0;
budgetAlloc=0; result=0x00000000
default 17:08:51.258726 +0800 kernel AppleH10CamIn::ISP_StopCamera_gated: subClientID=1; channel=0; budgetReq=0;
budgetAlloc=0; result=0x00000000, numPreviewFrames=72, numStillCaptureFrames:0
default 17:08:51.258813 +0800 kernel ISPCPU: [ISP: 2.904275] CH = 0 CMD = 0x0104 [CISP_CMD_CH_BUFFER_RETURN]
default 17:08:51.266156 +0800 kernel AppleH10CamIn::ISP_FlushInactiveDARTMappings: 0x00000000
default 17:08:51.266234 +0800 mediaserverd H10ISPServicesRemote: SetProperty 2 (sent)
default 17:08:51.267404 +0800 mediaserverd H10ISPServicesRemote: SetProperty 2 (reply=0x00000000)
default 17:08:51.272115 +0800 kernel ISPCPU: [ISP: 2.917542] CH = 0 CMD = 0x820b
[CISP_CMD_APPLE_CH_AE_TILES_MATRIX_METADATA_ENABLE]
default 17:08:51.273311 +0800 kernel ISPCPU: [ISP: 2.918641] CH = 0 CMD = 0x0130 [CISP_CMD_CH_GENERAL_PROCESS_STOP]
default 17:08:51.276237 +0800 kernel AppleH10CamIn::ISP_ReleaseChannel_gated - channel: 0 (process: mediaserverd)

iPhone 6s
default 17:18:52.814006 +0800 kernel AppleH6CamIn::setPowerStateGated: 1
default 17:18:52.814054 +0800 kernel AppleH6CamIn::power_on_hardware
default 17:18:52.910762 +0800 kernel AppleH6CamIn::MotionDataEnable: Enabling for Endpoint 0
default 17:18:52.924652 +0800 mediaserverd FigSignalError: -12785, invalidated
default 17:18:52.954154 +0800 mediaserverd FigSignalError: -12785, invalidated
default 17:18:52.954361 +0800 kernel AppleH6CamIn::ISP_SelectBestMIPIFrequencyIndex_gated - channel: 0, currentRawBitDepth: 1, index: 2
default 17:18:53.118463 +0800 kernel AppleH6CamIn::ISP_CopySetfile_gated (camChan=0)
default 17:19:12.307839 +0800 kernel AppleH6CamInUserClient::free - Freeing UserClient for process: mediaserverd (pid 2465)
default 17:19:12.308025 +0800 kernel AppleH6CamIn::setPowerStateGated: 0
default 17:19:12.308185 +0800 kernel AppleH6CamIn::power_off_hardware
default 17:19:12.321478 +0800 kernel AppleH6CamIn::MotionDataDisable: Enabling for Endpoint 0

iPhone Xs Max iPhone 8 Plus
https://github.com/freedomtan/TestANE/

/* Generated by RuntimeBrowser
Image: /System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine
*/
@interface _ANEDeviceInfo : NSObject
+ (id)bootArgs;
+ (id)buildVersion;
+ (bool)hasANE;
+ (bool)isInternalBuild;
+ (bool)precompiledModelChecksDisabled;
@end
https://github.com/nst/iOS-Runtime-Headers/blob/master/PrivateFrameworks/
AppleNeuralEngine.framework/_ANEDeviceInfo.h

size -l -x -m /tmp/arm64e/System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine
Segment __TEXT: 0x11000 (vmaddr 0x1abe22000 fileoff 0)
Section __text: 0xb728 (addr 0x1abe23d18 offset 7448)
Section __auth_stubs: 0x3d0 (addr 0x1abe2f440 offset 54336)
Section __cstring: 0xb87 (addr 0x1abe2f810 offset 55312)
Section __objc_methname: 0x10a5 (addr 0x1abe30397 offset 58263)
Section __objc_classname: 0x140 (addr 0x1abe3143c offset 62524)
Section __objc_methtype: 0x498 (addr 0x1abe3157c offset 62844)
Section __gcc_except_tab: 0x8cc (addr 0x1abe31a14 offset 64020)
Section __const: 0xd0 (addr 0x1abe322e0 offset 66272)
Section __oslogstring: 0x8d0 (addr 0x1abe323b0 offset 66480)
Section __unwind_info: 0x330 (addr 0x1abe32c80 offset 68736)
Section __eh_frame: 0x50 (addr 0x1abe32fb0 offset 69552)
total 0xf2e8
Segment __DATA: 0xe00 (vmaddr 0x1ba4ef3b8 fileoff 69632)
Section __objc_selrefs: 0x3e0 (addr 0x1ba4ef3b8 offset 69632)
Section __objc_protorefs: 0x10 (addr 0x1ba4ef798 offset 70624)
Section __objc_classrefs: 0x1b8 (addr 0x1ba4ef7a8 offset 70640)
Section __objc_superrefs: 0x38 (addr 0x1ba4ef960 offset 71080)
Section __objc_ivar: 0x60 (addr 0x1ba4ef998 offset 71136)
Section __objc_data: 0x4b0 (addr 0x1ba4ef9f8 offset 71232)
Section __data: 0x228 (addr 0x1ba4efea8 offset 72432)
Section __auth_ptr: 0x8 (addr 0x1ba4f00d0 offset 72984)
Section __bss: 0xe0 (addr 0x1ba4f00d8 offset 0)
total 0xe00
…

otool -o /tmp/arm64e/System/Library/
PrivateFrameworks/AppleNeuralEngine.framework/
AppleNeuralEngine
/tmp/arm64e/System/Library/PrivateFrameworks/
AppleNeuralEngine.framework/AppleNeuralEngine:
Contents of (__DATA_CONST,__objc_classlist)
section
00000001b7a76a78 0x80001ba4efa20
00000001b7a76a80 0x80001ba4efa70
00000001b7a76a88 0x80001ba4efa98
00000001b7a76a90 0x80001ba4efae8
…
~/work/ios-hacking/tools/jtool -d objc /tmp/arm64/System/
Library/PrivateFrameworks/AppleNeuralEngine.framework/
AppleNeuralEngine
Fat binary, big-endian, 1 architectures: will auto-process
this architecture
arm64_ANEDeviceInfo
_ANEDataReporter
_ANEProgramForEvaluation
_ANEModel
_ANEHashEncodin
_ANERequest
_ANELog
_ANEQoSMapper
_ANEStrings
_ANEDaemonConnection
_ANEIOSurfaceObject
_ANEDeviceController
_ANEClient
_ANEErrors
_ANECloneHelper
http://www.newosxbook.com/tools/jtool.html

Mach-O Headers
• Mac OS X ABI Mach-O File Format Reference, no longer
available on Apple web site, google it.

• headers: /usr/include/mach-o/loader.h

• objc runtime

• https://opensource.apple.com/source/objc4/
objc4-723/, https://opensource.apple.com/tarballs/
objc4/objc4-723.tar.gz

Dive a bit deeper into Core
ML
• Frameworks and some binaries used to be shipped unstripped as parts of iPhoneOS
SDK in Xcode. Not anymore, most framework binaries are in dyld_shared_cache.

• Fortunately, It’s quite easy to check iOS file system nowadays. Apple stopped encrypting
.ipsw since iOS 10 beta (more than 2 years ago). So, get a .ipsw, unzip it (remember it's
a .zip file), then mount the largest .dmg (this needs extra steps on Windows and Linux
though). E.g.,

1. get iOS 12.0 ipsw for iPhone Xs Max [1]. See [2] for other firmwares.

2. unzip it.

3. mount 048-10782-224.dmg, that's it. You can see the whole filesystem used by
iPhone Xs Max.

• Thus, we can get /System/Library/Caches/com.apple.dyld/
dyld_shared_cache_arm* we want

[1] http://updates-http.cdn-apple.com/2018FallFCS/fullrestores/091-65188/11BE19F6-AC8E-11E8-A312-F5CEDE149863/iPhone11,4,iPhone11,6_12.0_16A366_Restore.ipsw
[2] https://www.theiphonewiki.com/wiki/Firmware/iPhone/12.x

Dive a bit deeper into Core
ML
• If you are on macOS and have Xcode installed, there are some binaries
with symbols in ~/Library/Developer/Xcode/iOS
DeviceSupport/12.1 (16B92) arm64e/

• What do I mean by “some”? E.g., there is /System/Library/
PrivateFrameworks/AppleNeuralEngine.framework/
XPCServices/ANECompilerService.xpc/
ANECompilerService on A12 devices, but not in Xcode’s support
library

• Yes, we can ﬁnd /System/Library/Frameworks/
CoreML.framework/CoreML
• Even /System/Library/Caches/com.apple.dyld/
dyld_shared_cache_arm* is there

extract binaries from
dyld_shared_cache
• jtool can do it for you. E.g.,

• list

~/work/ios-hacking/tools/jtool -l /Volumes/Peace16A366.D331OS/System/Library/Caches/com.apple.dyld/dyld_shared_cache_arm64e
• extract

~/work/ios-hacking/tools/jtool -e /System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine /Volumes/Peace16A366.D331OS/System/
Library/Caches/com.apple.dyld/dyld_shared_cache_arm64e
Extracting /System/Library/PrivateFrameworks/AppleNeuralEngine.framework/AppleNeuralEngine at 0x2be22000 into dyld_shared_cache_arm64e.AppleNeuralEngine
• dyld source code

• https://opensource.apple.com/source/dyld/dyld-551.4/, https://
opensource.apple.com/tarballs/dyld/dyld-551.4.tar.gz

• Read dyld source and [1] for more about dyld_shared_cache

[1] https://iphonedevwiki.net/index.php/Dyld_shared_cache

What to read beyond
Apple’s docs
• https://www.theiphonewiki.com, e.g., https://
www.theiphonewiki.com/wiki/Firmware/iPhone/12.x

• http://iphonedevwiki.net/index.php/Main_Page, e.g.,
http://iphonedevwiki.net/index.php/
Reverse_Engineering_Tools

• http://newosxbook.com/index.php, e.g., http://
newosxbook.com/index.php?page=notes

• https://papers.put.as

kernel side
• So, how about extract or just put ANE related stuﬀ into A11
devices?

• Well, if you look into kernel_cache of A11 and A12 devices

• As expected, we can see lots of H11ANE information in
A12 kernel_cache

• A11 kernel_cache does mentioned H11ANE several
times, but it seems important modules are not there.

• So, I guess if we don’t jailbreak and root, we are out of luck!

Isn’t XNU (Darwin source
code open)?
• Well, there are more than 200 kernel modules, only some of them
are open

$ ~/work/ios-hacking/tools/jtool2 -k ../../iphonex/ipsw/kernelcache.release.iphone10b
0xfffffff00583c000:com.apple.kpi.mach
0xfffffff00583c080:com.apple.kpi.private
0xfffffff00583c100:com.apple.kpi.unsupported
0xfffffff00583c180:com.apple.kpi.iokit
0xfffffff00583c200:com.apple.kpi.libkern
0xfffffff00583c280:com.apple.kpi.bsd
0xfffffff00583c300:com.apple.iokit.IONetworkingFamily
0xfffffff00583de00:com.apple.iokit.IOTimeSyncFamily
0xfffffff0058416c0:com.apple.iokit.IOSlowAdaptiveClockingFamily
0xfffffff005841c40:com.apple.iokit.IOStorageFamily
0xfffffff005842e80:com.apple.iokit.IOReportFamily
0xfffffff005843680:com.apple.driver.AppleARMPlatform
0xfffffff00584cd80:com.apple.driver.AppleSamsungSPI
0xfffffff00584dd00:com.apple.kpi.dsep
0xfffffff00584dd80:com.apple.kec.corecrypto
…

Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?

Similar to Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices? (20)

More from Koan-Sin Tan

More from Koan-Sin Tan (16)

Recently uploaded

Recently uploaded (20)

Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?