Successfully reported this slideshow.
Your SlideShare is downloading. ×

OS frontiers in the AI era

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 46 Ad

More Related Content

Recently uploaded (20)

Advertisement

OS frontiers in the AI era

  1. 1. OS frontiers in the AI era Felix Xiaozhu Lin 1
  2. 2. Who Am I • Fall 2020. Assoc. Prof, UVA CS – Looking forward to it! • 2014 – 2020. Asst. Prof, Purdue ECE • 2014 PhD in CS. Rice  • 2008 MS + BS. Tsinghua 2
  3. 3. What is an OS? Undergrads (Some) data scientists Tech media 4 Are these OSes?
  4. 4. the era of AI! Where are  we today? 5 https://en.wikipedia.org/wiki/Time_(xkcd)#/media/File:Xkcd_time_frame_0001.png
  5. 5. The 3rd boom of AI  1960s 1970s 1980s 1990s 2000s 2010s 2020s Classical AI Expert systems Machine learning Ups Downs 6
  6. 6. Two Pendulums 1960s 1970s 1980s 1990s 2000s 2010s 2020s Expert systems Machine learning Ups Downs Centralized Decentralized Dedicated hw & net PC + Internet Datacenters Cloud computing 5G, edge,  embedded AI… 7 Classical AI
  7. 7. Two Pendulums 1960s 1970s 1980s 1990s 2000s 2010s 2020s Expert systems Machine learning Generalized Specialized Microcomputers, x86, DBMS, PC Mainframes, Apollo11 Ups Downs Wearables/IoT,  NoSQL, GPU/FPGA, RISCV Centralized Decentralized Dedicated hw & net PC + Internet Datacenters Cloud computing 5G, edge,  embedded AI… 8 Classical AI
  8. 8. Two Pendulums 1960s 1970s 1980s 1990s 2000s 2010s 2020s Expert systems Machine learning Generalized Specialized Microcomputers, x86, DBMS, PC Mainframes, Apollo11 Ups Downs Wearables/IoT,  NoSQL, GPU/FPGA, RISCV Centralized Decentralized Dedicated hw & net PC + Internet Datacenters Cloud computing 5G, edge,  embedded AI… 9 Classical AI
  9. 9. My view of OS OS == System Software == Software Infrastructure  12
  10. 10. What you have learnt from undergrad OS class 1960s 1970s 1980s 1990s 2000s 2010s 2020s GOFAI Expert systems Machine  learning Generalized Specialized Microcomputers, x86, DBMS, PC Mainframes, Apollo11 Ups Downs Wearables/IoT,  NoSQL, GPU/FPGA, RISCV Centralized Decentraliz ed Dedicated hw & net PC + Internet Datacenters Cloud  computing 5G, edge,  embedded AI… 14
  11. 11. What OS people are working on 1960s 1970s 1980s 1990s 2000s 2010s 2020s GOFAI Expert systems Machine  learning Generalized Specialized Microcomputers, x86, DBMS, PC Mainframes, Apollo11 Ups Downs Wearables/IoT,  NoSQL, GPU/FPGA, RISCV Centralized Decentraliz ed Dedicated hw & net PC + Internet Datacenters Cloud  computing 5G, edge,  embedded AI… 15
  12. 12. So, where are  we today? 17
  13. 13. • Diverse  • A few common OSes • Many specialized ones • Serving “things” in addition to humans • Defined by scenarios • Generic OSes  firmware • Specialized OSes  overlays • Blurry boundaries • arch, runtime, compiler, kernel,  hypervisor, trusted exec environment… Cloud Edge Devices 18
  14. 14. Three flavors of OS research 1. Tune up 2. Do a specific thing well 3. Show possibility  Common themes: open blackboxes,  break, & build 19
  15. 15. Case 1: A highspeed stream  analytics engine 21
  16. 16. High‐throughput. Sub‐second delay. Timely processing before data gets cold!  22 “Hot springs”: telemetry events Power sensor 140M events/day Oil rig 1‐2TB/day Manufacturing machines PBs/day
  17. 17. Stream analytics: state of the art • Classic engines? • StreamBase, Aurora, TelegraphCQ, NiagaraST… • Single threaded. Not scaling well.  • Modern engines for datacenters? • Apache Flink, Spark Streaming, Beam… • Designed for tens ‐ hundreds of machines. Scaling out.  • Assuming okay if individual nodes perform poorly • As analytics moves to the edge  bad 23
  18. 18. Project StreamBox stream analytics at the memory speed 24 • RDMA • Co‐designed with  mm/scheduling • RDMA • Co‐designed with  mm/scheduling Stream pipeline Threads Ingestion Scheduler Mem • Squeeze parallelism for  multi/manycore • Manage NUMA domains • Squeeze parallelism for  multi/manycore • Manage NUMA domains Exploit high‐bandwidth memoryExploit high‐bandwidth memory [ASPLOS'19] "StreamBox‐HBM: Stream Analytics on High Bandwidth Hybrid Memory," Hongyu Miao, Myeongjae Jeon, Gennady Pekhimenko,  Kathryn S. McKinley, and Felix Xiaozhu Lin [USENIX ATC'17] "StreamBox: Modern Stream Processing on a Multicore Machine," Hongyu Miao, Heejin Park, Myeongjae Jeon, Gennady  Pekhimenko, Kathryn S. McKinley, and Felix Xiaozhu Lin, in Proc. USENIX Annual Technical Conference, 2017. [ASPLOS'16] "memif: Towards Programming Heterogeneous Memory Asynchronously," Felix Xiaozhu Lin and Xu Liu, in Proc. ACM Int. Conf.  Architectural Support for Programming Languages and Operating Systems, 2016.
  19. 19. CoresCores High‐bandwidth hybrid memory 25 3D DRAM Normal DRAM Tradeoffs: capacity vs. bandwidth Untraditional memory hierarchy No latency benefit  Unlike SRAM+DRAM 16 GB 375 GB/s ~100 GB ~100 GB/s
  20. 20. 26 Already on off‐the‐shelf machinesIntel Xeon Phi Knights Landing (KNL)
  21. 21. 27 HBM CoresCores Normal DRAM Streaming data Data Bundles Index {key, pointer} Capacity: Use HBM only for grouping indexes
  22. 22. 28 Cheap VM (huge page) Apps OS kernel Fast net stack (40 GbE or RDMA) High task  parallelism Custom mem  allocator Sequential mem access Runtime Thread pool  + custom task scheduler Wide SIMD  (avx512) Hybrid memory A system software’s approach to 3D DRAM Blurry boundaries
  23. 23. Case 2: Autonomous AI on  Cameras [MobiSys'20] "Approximate Query Processing on Autonomous Cameras," Mengwei Xu, Xiwen Zhang, Yunxin Liu,  Xuanzhe Liu, and Felix Xiaozhu Lin 29
  24. 24. X1Video: a killer AI application 30
  25. 25. 31
  26. 26. 32
  27. 27. 33 Cut the cords
  28. 28. Run on harvested energy Solar  Wind  34
  29. 29. 35 … and on wireless
  30. 30. Construction sites Farms Boats/RVs Warehouses Photos Credits: Reolink 36 Pervasive cameras
  31. 31. Object  Counts Elf Running analytics on wire‐free  cameras? 37
  32. 32. Query: (car, 30 mins) Install 7:00AM-7:30AM [500 + 100] Cars 7:30AM-8:00AM [700 + 140] Cars 8:00AM-8:30AM [800 + 180] Cars 8:30AM-9:00AM [400 + 100] Cars 9:30AM-10:00AM [200 + 80] Cars Sample &  capture 200‐80 +80 Elf: Query model 38
  33. 33. Camera Operating System Planning Energy via  Reinforcement Learning Planning Energy via  Reinforcement Learning Sampled frames Aggregator with error Integration Aggregator with error Integration Selected NeuralNet Object counts Elf: System Internals 39
  34. 34. Elf prototype: heterogeneous processors 40
  35. 35. 7:00AM-7:30AM [500 + 100] Cars 7:30AM-8:00AM [700 + 140] Cars 8:00AM-8:30AM [800 + 180] Cars 8:30AM-9:00AM [400 + 100] Cars 9:30AM-10:00AM [200 + 80] Cars Ground Truth Error: 11% Confidence interval width: 17% Auburn, AL Hampton, NY Jackson, WY Taipei Taipei ~1000 hours  41
  36. 36. M4 AUTONOMOUS Intelligence OS for AI + AI for OS  = Autonomous infrastructure 42
  37. 37. Case 3: Kernel IO on co‐processors  43
  38. 38. 44
  39. 39. 45 Weak co‐processors
  40. 40. 46 CPU Co Proc 2.5GHz 50MHz DRAM IO A heterogeneous SoC
  41. 41. 47 CPU Co Proc 2.5GHz 50MHz DRAM IO Weak co‐processors: suits low‐power IO tasks! high  efficiency Linux Kernel IO  tasks
  42. 42. 48 CPU Co Proc 2.5GHz 50MHz DRAM IO Kernel execution on weak co‐processors? Linux Kernel IO  tasks Diff ISA No MMU …
  43. 43. 49 CPU Co Proc DRAM IO Co‐processor translates unmodified kernel binary Dynamic Binary Translation Linux Kernel IO  tasks [USENIX ATC 19] "Transkernel: An Executor for Commodity Kernels on Peripheral Cores,"  Liwei Guo, Shuang Zhai, Yi Qiao, and Felix Xiaozhu Lin
  44. 44. Take away:  • Taming an OS kernel (a beast) for new hardware  • … without re‐engineering much of the software stack  50
  45. 45. Recap • What are OSes in 2020? • Three OS projects:  • spanning IoT, mobile, and datacenters • each with different flavors • The builder culture  • open blackboxes • break things • build things from the ground up • Started by a small group of hardcore hackers • Now more diverse and inclusive  • A brave new world 51 StreamBox Elf
  46. 46. Image credits • https://dsportmag.com/the‐tech/test‐n‐tune/test‐tune‐2017‐subaru‐wrx‐ sti‐part3‐closer‐to‐the‐ej257s‐limits/ • https://techcrunch.com/2009/11/19/redneck‐rampage‐a‐truck‐with‐a‐jet‐ engine/?utm_source=feedburner#038;utm_medium=email 52

×