2. About Me – Kazuaki Ishizaki
▪ Researcher at IBM Research – Tokyo
https://ibm.biz/ishizaki
– Compiler optimization, language runtime, and parallel processing
▪ Apache Spark committer from 2018/9 (SQL module)
▪ Work for IBM Java (Open J9, now) from 1996
– Technical lead for Just-in-time compiler for PowerPC
▪ ACM Distinguished Member
▪ SNS
– @kiszk
– https://www.slideshare.net/ishizaki/
2 Make AI ecosystem more interoperable - Kazuaki Ishizaki
3. Agenda
▪ Motivation
▪ What is an inhibitor of interoperability?
– Endianness on each machine
▪ What is endian?
▪ What happens in a program?
▪ How to find and fix issues?
▪ How to keep interoperability in AI ecosystem
3 Make AI ecosystem more interoperable - Kazuaki Ishizaki
4. Very Impressive Performance Improvement on x86
▪ Improve performance of Spark with Python by over 100x
4 Make AI ecosystem more interoperable - Kazuaki Ishizaki
Source: https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html
Apache Spark uses
Apache Arrow
A cross-language
development platform
for in-memory analytics
5. I Want to Do This on IBM Z
5 Make AI ecosystem more interoperable - Kazuaki Ishizaki
$ bin/pyspark
...
>>> df.show()
6. Oh!!!
6 Make AI ecosystem more interoperable - Kazuaki Ishizaki
$ bin/pyspark
...
>>> df.show()
...
java.lang.IllegalStateException: Arrow only runs on LittleEndian systems…
...
>>>
7. Apache Arrow supported only Little Endian
7 Make AI ecosystem more interoperable - Kazuaki Ishizaki
$ bin/pyspark
...
>>> df.show()
...
java.lang.IllegalStateException: Arrow only runs on LittleEndian systems…
...
>>>
8. One Pager for Current AI Ecosystem
▪ Data can be exchanged among little endian machines (i.e. x86, Arm,
PowerLinux, …)
8 Make AI ecosystem more interoperable - Kazuaki Ishizaki
PowerLinux
Arm Origin: https://www.dremio.com/webinars/apache-arrow-in-theory-practice/
9. One Pager for Expected AI Ecosystem
▪ Data can be exchanged among both endian machines (i.e. x86, Arm, s390x,
PowerLinux, …)
9 Make AI ecosystem more interoperable - Kazuaki Ishizaki
PowerLinux
Arm
s390x
Origin: https://www.dremio.com/webinars/apache-arrow-in-theory-practice/
10. What is Endian?
▪ Data layout on a memory
– Example of integer 32bit value
10 Make AI ecosystem more interoperable - Kazuaki Ishizaki
0x01020304
11. What is Endian?
▪ Data layout on a memory
– Example of integer 32bit value
11 Make AI ecosystem more interoperable - Kazuaki Ishizaki
04
memory layout
Little endian
03
0x01020304
02 01
10
addr 11 12 13
x86_64, ppc64le, …
12. What is Endian?
▪ Data layout on a memory
– Example of integer 32bit value
12 Make AI ecosystem more interoperable - Kazuaki Ishizaki
04
memory layout
Little endian Big endian
03
0x01020304
02 01 01 02 03 04
memory layout
10
addr 11 12 13 10
addr 11 12 13
x86_64, ppc64le, … s390x
13. Why Programs Usually Work Well?
▪ Programs work well without special cares if
– No explicit memory access of a subset of data and/or of a super-set of data
– A program is closed itself (no data exchange with other machines)
13 Make AI ecosystem more interoperable - Kazuaki Ishizaki
int32_t a, b;
int16_t d;
...
int32_t c = a + b;
int16_t d = static_cast<int16_t>(c) + 1;
int32_t e = static_cast<int32_t>(d);
...
14. How to Find Issues on Different Endians?
▪ Find bad smells in source code
14 Make AI ecosystem more interoperable - Kazuaki Ishizaki
15. Does It Have Bad Smell?
▪ Get 32-bit data from int8 datum by reinterpret_cast
15 Make AI ecosystem more interoperable - Kazuaki Ishizaki
uint8_t *i8p = ...
i8p[0] = 4; i8p[1] = 3; i8p[2] = 2; i8p[3] = 3;
int32_t i32 = *reinterpret_cast<int32_t *>(i8p);
printf(“%08x”, i32);
16. Results are Different on Different Endian Machines
16 Make AI ecosystem more interoperable - Kazuaki Ishizaki
uint8_t *i8p = ...
i8p[0] = 4; i8p[1] = 3; i8p[2] = 2; i8p[3] = 3;
int32_t i32 = *reinterpret_cast<int32_t *>(i8p);
printf(“%08x”, i32);
Little endian Big endian
01020304 04030201
17. Why Problem Occurs?
▪ Different endian processors interpret the same memory sequence in
different ways
17 Make AI ecosystem more interoperable - Kazuaki Ishizaki
04 03 02 01
04 03 02 01
memory layout
memory layout
04030201
01020304
uint8_t *i8p = ...
i8p[0] = 4; i8p[1] = 3; i8p[2] = 2; i8p[3] = 1;
int32_t i32 = *reinterpret_cast<int32_t *>(i8p);
printf(“%08x”, i32);
Little endian Big endian
i8p i8p
18. Support Both Endians
▪ Swap data for big endian
18 Make AI ecosystem more interoperable - Kazuaki Ishizaki
uint8_t *i8p = ...;
i8p[0] = 4; i8p[1] = 3; i8p[2] = 2; i8p[3] = 1;
int32_t i32 = reinterpret_cast<int32_t *>(i8p);
#if !defined(__LITTLE_ENDIAN__)
i32 = __builtin_bswap32(i32);
#endif
printf(“%08x”, i32);
Little endian Big endian
01020304 01020304
19. Support Both Endians in Java
▪ Swap data for big endian
19 Make AI ecosystem more interoperable - Kazuaki Ishizaki
static final boolean LITTLE_ENDIAN =
ByteOrder.nativeOrder() == ByteOrder.LITTLE_ENDIAN;
int i32 = ... // get the value from a buffer
if (!LITTLE_ENDIAN) {
i32 = Integer.reverseBytes(i32);
}
20. Potential Bad Smell and Enhancements
▪ Intra-process (i.e. In-memory)
– Get data from the different data type
20 Make AI ecosystem more interoperable - Kazuaki Ishizaki
04 03 02 01
04 03 02 01
memory layout
memory layout
01020304
01020304 Swap
21. Potential Bad Smell and Enhancements
▪ Intra-process (i.e. In-memory)
– Get data from memory in different data type
▪ Inter-process (i.e. host – client)
– Exchange data with other machines
21 Make AI ecosystem more interoperable - Kazuaki Ishizaki
01 02 03 04
04 03 02 01
04 03 02 01
memory layout
memory layout
memory layout
04 03 02 01
memory layout
01020304
01020304
Swap
01020304
Swap
22. Can We Find All Issues?
▪ Find bad smells in source code
22 Make AI ecosystem more interoperable - Kazuaki Ishizaki
New code is coming everyday
23. Automatically Detect Issues
▪ Continuously run test cases on machines with different endians
23 Make AI ecosystem more interoperable - Kazuaki Ishizaki
Run test cases
24. Automatically Detect Issues
▪ Continuously run test cases on machines with different endians
▪ Enhance code to support both endians if we find issues
24 Make AI ecosystem more interoperable - Kazuaki Ishizaki
Run test cases Enhance code
25. CI Tools and Instances Help OSS Community
▪ TravisCI
▪ Jenkins
▪ Virtual machine instance
25 Make AI ecosystem more interoperable - Kazuaki Ishizaki
Enhance code
26. Free Resources of Big Endian for OSS Community
▪ TravisCI
– https://docs.travis-ci.com/user/multi-cpu-architectures/
▪ Jenkins
– https://osuosl.org/services/ibm-z/
▪ Virtual machine instance
– https://developer.ibm.com/components/ibm-linuxone/gettingstarted/
26 Make AI ecosystem more interoperable - Kazuaki Ishizaki
27. Apache Arrow Supports Both Endians
▪ Intra-process (from Apache Arrow 3.0)
– C and Java bindings
▪ Inter-process (from Apache Arrow 4.0)
– C bindings
27 Make AI ecosystem more interoperable - Kazuaki Ishizaki
CI on big ending is running for every PR update
28. Takeaway
▪ We know different endians on machines
– Little endian and big endian
▪ When do we take care of endians?
– Get a sub-set or super-set of data in memory
– Exchange data with other machines
▪ How to find potential issues and support both endians?
– Find bad smell
– Automatically run test cases
▪ How to keep interoperability in AI ecosystem?
– Easy and free to run CI on different types of machines
28 Make AI ecosystem more interoperable - Kazuaki Ishizaki
Visit https://www.slideshare.net/ishizaki if you are interested in this slide