Conversation
delay swap_plan() by 3 more iterations update train correct swap_sched(), swap_select(),swap_plan() correct load update in swap_select vec_run changed to new 3 iterations correct vec_run36 index issue correct overhead issue, verify vec_run.t vec_run duplicate to avoid sorting issue
verified itm 5 indices in Table_sched vec_swap_select pass by reference in swap_sched() impl swap_update_tables(), before DeploySwap(), both at Append() for time being, remove negative r_idx itms && git push origin vd1 handle last itr by impl sizeSqn and verification to change asyncSwapFlag back to 0
correct swap_construct_tables(), included negative r_idx for swap_update_tables() and DeploySwap() include negative r_idx for DeploySwap() impl GetRealGpuPtr() to swapIn nullptr Block at last iteration impl GetRealGpuPtr(), and optimize data() and mutable_data() impl GetRealGpuPtr(), and optimize data() and mutable_data() verify const issue change to return tempData instead of updating data_ without remove erasing in Table_not_at_device milestone of last itr, at 550 MB
new class of pool: SwapPool important APIs: PoolOpt(), Malloc(), Free() PoolOpt() takes in M/F sequences including those induced by swapping cross-iteration variables and last iteration case solved. record down MF after swap done, for one iteration
CMakeLists.txt
Outdated
| include(ExternalProject) | ||
| ExternalProject_Add(cnmem | ||
| GIT_REPOSITORY "https://github.com/nusdbsystem/cnmem.git" | ||
| GIT_REPOSITORY "https://github.com/junzhezhang/cnmem.git" |
There was a problem hiding this comment.
did you change cnmem source code?
| @@ -31,24 +31,25 @@ | |||
| import os | |||
There was a problem hiding this comment.
pls keep the cnn.py (instead of alexnet.py)
There was a problem hiding this comment.
I think you don't need to change the example model code.
include/singa/core/common.h
Outdated
| Block(void* ptr, size_t size, size_t offset = 0) | ||
| : data_(ptr), size_(size), offset_(offset) { | ||
| Block(void* ptr, size_t size, size_t offset = 0, Device* ptrDevice = nullptr) | ||
| : data_(ptr), size_(size), offset_(offset), ptrDevice_(ptrDevice) { |
include/singa/core/device.h
Outdated
| ///SwapGPU | ||
| struct onePieceMsg{ | ||
| /* | ||
| members: [ptr, size, MallocFree, idx] |
There was a problem hiding this comment.
pls make the names consistent: MallocFree -> malloc_free
include/singa/core/device.h
Outdated
| int MallocFree; | ||
| int idx; | ||
| double t; | ||
| onePieceMsg(string p, size_t s, int M, int i):ptr(p),size(s),MallocFree(M),idx(i){} |
include/singa/core/device.h
Outdated
| /// Called by Tensor. | ||
| void FreeBlock(Block* block); | ||
|
|
||
| void AppendInfo(string blockInfo); |
There was a problem hiding this comment.
any comments on the blockInfo? better give an example.
include/singa/core/device.h
Outdated
|
|
||
| int id() const { return id_; } | ||
|
|
||
| virtual void* GetRealGpuPtr(const Block* block_) = 0; |
There was a problem hiding this comment.
what do you mean by real gpu ptr?
| ///SwapGPU | ||
| struct onePieceMsg{ | ||
| /* | ||
| members: [ptr, size, operation_type, idx] |
include/singa/core/device.h
Outdated
| map<int,std::tuple<int,int,int,int>>Table_sched; // changed to with sync_r_idx | ||
|
|
||
| //vec_block | ||
| vector<string>vec_block; //iteration 0-3 |
There was a problem hiding this comment.
the comments cannot explain the code.
include/singa/core/memory.h
Outdated
| vector<string> vec; | ||
| vector<string> vec_block_RW; | ||
| vector<string> vec_block_RWMF; | ||
| map<int,int>Table_r2d; //full duration info, cross-iteration duration. |
There was a problem hiding this comment.
the names Table_r2d and Table_d2r are not descriptive..
src/core/common/common.cc
Outdated
| ptr_device_->AppendInfo(temp); | ||
| } | ||
| if (data_ == nullptr) { | ||
| cout<<"before GetRealGpuPtr, block_ and data_: "<<this<<' '<<data_<<endl; |
There was a problem hiding this comment.
cout??
It will dump too many prints on the screen..
src/core/device/swap_gpu.cc
Outdated
| }; | ||
|
|
||
|
|
||
| struct oneIterMsg{ |
There was a problem hiding this comment.
this kind of names is not descriptive.
src/core/device/swap_gpu.cc
Outdated
| //vector of pairMsg is used in run. | ||
| //vector of iterMsg is used in test. | ||
|
|
||
| vector<onePieceMsg> swap_strVec_2_pieceMsgVec(vector<string> vec, int &idxRange){ |
There was a problem hiding this comment.
don't mix different naming styles..
src/core/device/swap_gpu.cc
Outdated
| idxRange = static_cast<int>(onePieceMsgVec_.size()); | ||
|
|
||
| return onePieceMsgVec_; | ||
| }// end of strVec_2_pieceMsgVec function |
There was a problem hiding this comment.
this comment is meaningless..
better add some comments for the functionality or the arguments.
|
Updated the PR as per required on 09 Nov meeting, focused on correctness and code readability. |
nudles
left a comment
There was a problem hiding this comment.
- can you separate the Pool and Swap+Pool into 2 pull requests?
- please replace the strings with structs.
| train((train_x, train_y, test_x, test_y), net, 250, vgg_lr, 0.0005, | ||
| use_cpu=args.use_cpu) | ||
| use_cpu=args.use_cpu,batch_size=args.batch_size) | ||
| else: |
There was a problem hiding this comment.
again, it would be better to keep the original example code.
| #include <memory> | ||
| #include "singa/utils/logging.h" | ||
|
|
||
| #include <string> |
include/singa/core/device.h
Outdated
| /// Called by Tensor. | ||
| void FreeBlock(Block* block); | ||
|
|
||
| void AppendInfo(string block_info); |
There was a problem hiding this comment.
as discussed, please use a structure or class for block_info instead of string.
| float mem_limit_ratio = 0.70; | ||
| size_t smallest_block = 1<<20; //1 MB | ||
| int data_buffer = 4; // used to control readyIdx | ||
| int mutable_data_buffer = 6; |
src/core/common/common.cc
Outdated
| //Append block info: opt_type, ptr, time_stamp | ||
| if (ptr_device_!=nullptr){ | ||
| //Append info. | ||
| stringstream strm2; |
There was a problem hiding this comment.
the code would be cleaner if the strings are replaced with struct.
| int name; | ||
| size_t size; | ||
| int r_idx; | ||
| int d_idx; |
|
Replaced the strings with structs in Append function as per requested, updated in branch vd2. New PR could not be created to Apache master, so it was created to my forked repo. For the other request, separation of Pool and Swap+Pool into 2 PR is not possible from git, as they were updated mixed and match. But they are well separated in different classes of Device and Memory |
new class of pool: SwapPool
important APIs: PoolOpt(), Malloc(), Free()
PoolOpt() takes in M/F seq including those induced by swapping
cross-iteration variables and last iteration case solved.