IMAX2/3 Docs/Tutorials

Download IMAX2/3

Introduction to IMAX3: Amazing Dataflow-Centric Gen4-CGLA(non-CGRA) (CGLA:Coarse Grained Linear Array)

Introductive slides with synthesizable notes

0.非常識に理解するコンピュータ(0.予告編) 0.IMAX3 begins(0.Trailer)
1.非常識に理解するコンピュータ(1.集めたデータはどこに置くのがいいの?) 1.IMAX3 begins(1.Where is the best location to save data?)
2.非常識に理解するコンピュータ(2.データに置き方ってあるの?) 2.IMAX3 begins(2.Is there a manner to put data?)
3.非常識に理解するコンピュータ(3.計算って何のこと?) 3.IMAX3 begins(3.What do you mean by calculation?)
4.非常識に理解するコンピュータ(4.押しかけるのがいいの?待つのがいいの?) 4.IMAX3 begins(4.Should I push? Should I wait?)
5.非常識に理解するコンピュータ(5.何を勉強すれば給料もらえるの?)

Expertized slides with synthesizable notes

0.Let's start Gen3-CGLA(non-CGRA)
1.Introduction
2.Image filters basic
3.Image filters advanced
4.Image filters professional
5.Machine Learning
6.High-degree stencil computation
7.Inverse matrix
8.Sparse matrix and Sorting
9.Hash, FFT and String search
10.High-speed compiler
11.Three level sophisticated loop
12.拡張性編 12.Scalability
13.HW/SW協調設計編 13.HW/SW codesign
0-13.短い総集編(#1-#13) 0-13.Short summary(#1-#13)
0-13.長い総集編(#1-#13) 0-13.Long summary(#1-#13)
14.CPU/Vectorとの違い編 14.Difference from CPU/Vector
15.ソフト制御キャッシュの仕組み 15.Software-controlled cache memory
16.チップレットとの相性
17.データ流の自由度と最適化指針
18.4次元配列計算の写像
19.IMAX3でchat.pyが動くまで
20.CGLAあみだくじ 20.Decision Tree
21.プロジェクト実習
22.ホストキャッシュメモリの有効化
23.LLAMA編
24.データフローと写像の種類
25.もっとLLM
26.スタートアップ用カタログ
27.特許のまとめ

IMAX2 Kit

ZCU102 (8 units) ... Vivado project is included.

  1. linux# zcat ZCU102-step4000-20221020.img.gz | dd bs=64k of=/dev/mmcblk0 (16GB SDcard)
  2. linux# mount /dev/mmcblk0p2 /mnt
  3. linux# replace root-password in /mnt/etc/shadow
  4. linux# umount /mnt
  5. zcu102# insert SDcard
  6. zcu102# boot from SDcard
  7. zcu102# create users
  8. zcu102% extract proj-arm64.tgz (NFS is recommendation)
  9. zcu102% proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma-8st (matrix multiplication)

ZCU111 (16 units)

  • 250MHz, IMAX2 16 cores, 640 operations / 4 cycles, Cache/core 128KB
  • 32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA, Dual addr-synchronizer
  • proj-arm64/fpga/ZCU111-step4000-20220301.img.gz
  1. linux# zcat ZCU111-step4000-20220301.img.gz | dd bs=64k of=/dev/mmcblk0 (16GB SDcard)
  2. linux# mount /dev/mmcblk0p2 /mnt
  3. linux# replace root-password in /mnt/etc/shadow
  4. linux# umount /mnt
  5. zcu111# insert SDcard
  6. zcu111# boot from SDcard
  7. zcu111# create users
  8. zcu111% extract proj-arm64.tgz (NFS is recommendation)
  9. zcu111% proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma-16st (matrix multiplication)

ZU19EG (16 units) ... Vivado project is included.

  1. linux# zcat ZU19EG-step4000-20241111.img.gz | dd bs=64k of=/dev/mmcblk0 (16GB SDcard)
  2. linux# mount /dev/mmcblk0p2 /mnt
  3. linux# replace root-password in /mnt/etc/shadow
  4. linux# umount /mnt
  5. zu19eg# insert SDcard
  6. zu19eg# boot from SDcard (dhcp)
  7. linux% ssh -Y [email protected] (Xwindow)
  8. zu19eg% zcat proj-arm64.tgz|tar xpf -
  9. zu19eg% sudo proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma-16st (matrix-mult)
  10. passwd: temppwd
  11. localhost:11.0: Cannot open display
  12. zu19eg% cp ~/.Xauthority /tmp/111
  13. zu19eg% sudo cp /tmp/111 /root/.Xauthority
  14. zu19eg% sudo proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma-16st (retry)
  15. <<<ORIG>>>
  16. usec: ARM:2098589 DRAIN:0 CONF:0 REGV:0 RANGE:0 LOAD:0 EXEC:0 total:2098589 (usec)
  17. <<<IMAX>>>
  18. usec: ARM:426 DRAIN:1224 CONF:105 REGV:1041 RANGE:663 LOAD:14861 EXEC:24324 total:42647 (usec)
  19. zu19eg% cd proj-arm64/sample/mm_cnn_lf
  20. zu19eg% make -f Makefile-zynq.emax6+dma mm-zynq.emax6+dma-16st (how to make)

ZCU102+VU440 (64/128/192/256/512 units) ... Vivado project is included.

  1. vu440# connect with zcu102 (see figure)
  2. vu440# write VU440-step4000-20221020-V24.1-78.125+78.125+48+260+130+48-CRYPTO-SPU.bin to SDcard
  3. vu440# insert SDcard
  4. linux# zcat ZCU102-step4000-20201010.img.gz | dd bs=64k of=/dev/mmcblk0 (16GB SDcard)
  5. linux# mount /dev/mmcblk0p2 /mnt
  6. linux# replace root-password in /mnt/etc/shadow
  7. linux# umount /mnt
  8. zcu102# insert SDcard
  9. zcu102# boot from SDcard
  10. zcu102# create users
  11. zcu102% extract proj-arm64.tgz (NFS is recommendation)
  12. zcu102% sudo proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma (matrix multiplication)

IMAX3 Kit

VMK180 (32 units) ... Vivado project is included.

  1. linux# zcat VMK180-step4000-20230410.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  2. linux# mount /dev/mmcblk0p2 /mnt
  3. linux# replace root-password in /mnt/etc/shadow
  4. linux# umount /mnt
  5. vmk180# insert SDcard
  6. vmk180# boot from SDcard
  7. vmk180# create users
  8. vmk180% extract proj-arm64.tgz
  9. vmk180% proj-arm64/sample/mm_cnn_lf/mm-acap.emax7+dma-32st (matrix multiplication)

VMK180 (32*2 units) ... Vivado project is included.

  1. linux# zcat VMK180-step4200-MASTER.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  2. linux# zcat VMK180-step4200-SLAVE.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  3. linux# mount /dev/mmcblk0p2 /mnt
  4. linux# replace root-password in /mnt/etc/shadow
  5. linux# umount /mnt
  6. vmk180# connect two boards w/ QSFP28-AOC cable
  7. vmk180# insert SDcard
  8. vmk180# boot from SDcard
  9. vmk180# create users
  10. vmk180% extract proj-arm64.tgz
  11. vmk180% proj-arm64/sample/mm_cnn_lf/mm-acap.emax7+dma-32st (matrix multiplication)
  12. vmk180% proj-arm64/sample/test/test025-acap.emax7+dma-32st (dual matrix multiplication)
  13. vmk180% cd proj-arm64/sample/tsim (MNIST/CIFAR10)
  14. vmk180% ./tsim-vmk180.emax7+dma -x -i -r -I0 -C1 -F1 (MNIST conv1+fc inference)
  15. vmk180% ./tsim-vmk180.emax7+dma -x -t -I0 -C1 -F1 (MNIST conv1+fc training)
  16. vmk180% ./tsim-vmk180.emax7+dma -x -i -r -I0 -C3 -F1 (MNIST conv3+fc inference)
  17. vmk180% ./tsim-vmk180.emax7+dma -x -t -I0 -C3 -F1 (MNIST conv3+fc training)
  18. vmk180% ./tsim-vmk180.emax7+dma -x -i -r -I1 -C6 -F2 (CIFAR10 conv6+fc2 inference)
  19. vmk180% ./tsim-vmk180.emax7+dma -x -t -I1 -C6 -F2 (CIFAR10 conv6+fc2 training)

VPK180 (64*2 units)

VPK180 (64*8 units)

  1. linux# zcat alice120-step4800-master.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  2. linux# zcat alice122-step4800-slave1.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  3. linux# zcat alice124-step4800-slave2.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  4. linux# zcat alice126-step4800-slave3.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
  5. linux# mount /dev/mmcblk0p2 /mnt
  6. linux# replace root-password in /mnt/etc/shadow
  7. linux# umount /mnt
  8. vmk180# connect four boards w/ QSFPDD-DAC cable
  9. vpk180# insert SDcard
  10. vpk180# boot from SDcard
  11. vpk180# create users
  12. vpk180% extract proj-arm64.tgz
  13. vpk180% sudo proj-arm64/sample/mm_cnn_lf/mm-acap.emax7+dma (matrix multiplication)
  14. vpk180% sudo proj-arm64/sample/test/test025-acap.emax7+dma (dual matrix multiplication)
  15. vpk180% cd proj-arm64/sample/tsim (MNIST/CIFAR10)
  16. vpk180% ./tsim-acap.emax7+dma -x -i -r -I0 -C1 -F1 (MNIST conv*1+fc inference)
  17. vpk180% ./tsim-acap.emax7+dma -x -t -I0 -C1 -F1 (MNIST conv*1+fc training)
  18. vpk180% ./tsim-acap.emax7+dma -x -i -r -I0 -C3 -F1 (MNIST conv*3+fc inference)
  19. vpk180% ./tsim-acap.emax7+dma -x -t -I0 -C3 -F1 (MNIST conv*3+fc training)
  20. vpk180% ./tsim-acap.emax7+dma -x -i -r -I1 -C6 -F2 (CIFAR10 conv6+fc2 inference)
  21. vpk180% ./tsim-acap.emax7+dma -x -t -I1 -C6 -F2 (CIFAR10 conv6+fc2 training)