programming – 第 2 頁 – Algorithm Notes

31 12 月, 2016

zlib example

#include <stdio.h>

#include <zlib.h>
#include <cstdlib>
#include <vector>

using namespace std;

void compressFile(const char *fileName, const vector<char> &in) {
	//we will use GZip from zlib
	gzFile gz_file;
	//open the file for writing in binary mode
	gz_file = gzopen(fileName, "wb");

	//Get the size of the stream
	unsigned long int file_size = sizeof(char) * in.size();
	//Write the size of the stream, this is needed so that we know
	//how much to read back in later
	gzwrite(gz_file, (void*) &file_size, sizeof(file_size));
	//Write the data
	gzwrite(gz_file, (void*) in.data(), file_size);
	//close the file
	gzclose(gz_file);
}

void decompressFile(const char *fileName, vector<char> &out) {
	//open the file for reading in binary mode
	gzFile gz_file = gzopen(fileName, "rb");
	//this variable will hold the size of the file
	unsigned long int size;
	//we wrote out a unsigned long int when storing the file
	//read this back in to get the size of the uncompressed data
	gzread(gz_file, (void*) &size, sizeof(size));
	//resize the string
	out.resize(size / sizeof(char));
	//read in and uncompress the entire data set at once
	gzread(gz_file, (void*) out.data(), size);
	//close the file
	gzclose(gz_file);
}

int main(int argc, char* argv[])
{
	vector<char> in, out;

	// generate data
	for (int i = 0; i < 1920*1080; ++i) {
		int d = rand();
		for (int j = 0; j < 8; ++j) {
			char d2 = (d >> j) & 0xF;
			in.push_back(d2);
		}
	}

	compressFile("com.gz", in);
	decompressFile("com.gz", out);

	// check decompressed data
	for (int i = 0; i < (int) out.size(); ++i) {
		if ( in[i] != out[i] ) {
			printf("error\n");
		}
	}
    return 0;
}

16 11 月, 2016

C programming language: Volatile is dangerous

最近真的是被 volatile 搞到了

一直以來 volatile 的用法，其實不只是單純的每次執行到該指令時從記憶體內重新讀取，不使用 CPU 內 cache，因為在執行階段記憶體的內容可能被 interrupt 更新，在經過 compiler 最佳化過後會產生意想不到的執行結果

一個經典 while loop 例子

volatile int loop = 1;
while ( loop ) {
    printf("I'm running.\n");
}

如果沒有 volatile 修飾，compiler 最佳化後的結果是不可能跳出 while 迴圈的，但事情恐怕不是只有取消 cache 記憶體讀取這麼簡單

考慮下面一個簡單的迴圈

for (int i = 0; i < 256; ++i) {
    *((volatile unsigned long *)(0x12345678)) = i;
}

我們對記憶體 0x12345678 重複寫入不同值，這在 firmware 操作經常用來對硬體填入一個巨大的陣列，但硬體只提供一個固定位置的 register interface

如果沒有 volatile 修飾，上面這段 code 可能被 compiler 最佳化成

*((unsigned long *)(0x12345678)) = 255;

意思是我們的表個僅填入最後一筆值，不會正確的填寫完整的陣列

但這並不代表我們應該取消 compiler 最佳化，或是瘋狂的在所有存取指令加入 volatile ，而是需要去了解 code implicit 意義

8 11 月, 201615 11 月, 2016

OpenMP nested parallelization

OpenMP 預設只會平行化第一階層 directives ，當被平行化的 thread 再呼叫 OpenMP 時會自動被忽略，但是我們可以強制展開巢狀階層的 directives ，對於 function 呼叫產生新的 thread 也同樣有效

兩種解法，一是在程式碼中插入omp_set_nested(1)，另外就是從環境變數著手，如下兩種都可以

第一種方法是在程式碼中呼叫 omp_set_nested ，使用需要 include omp.h

omp_set_nested(1);

第二種方法則是設定環境變數，注意環境變數的優先權比 omp_set_nested 低，所以會被覆蓋

set OMP_NESTED=TRUE

一個完整的 nested openmp parallel for 範例，這邊使用 omp_set_nested ，第一次呼叫 fun() 時會同時產生 4 個 thread，第二次呼叫 fun() 時由於沒有 nested parallelization 則是 2 個 thread

#include <omp.h>
#include <stdio.h>
#include <time.h>
#include <windows.h>
 
void fun() {
	#pragma omp parallel for
	for (int i = 0; i < 2; ++i) {
		#pragma omp parallel for
		for (int j = 0; j < 2; ++j) {
			printf("%d %d %d\n", i, j, (int)clock());
			Sleep(1000);
		}
	}
}

int main() {

	omp_set_nested(1);
	fun();
	
	omp_set_nested(0);
	fun();
	
	return 0;
}

執行結果如下，當 omp_set_nested 設為 1 時，第二層 loop 是同時被平行化的，反之則是無平行化循序執行

14 10 月, 201615 10 月, 2016

Traverse Clang AST: visitor

我們在前一篇 Clang AST dump 中交代了指令可以利用 clang 完整 dump AST。其實已經可以自己寫程式 traverse AST 了。但又何必再 parse 一次呢? 可以直接利用 libclang 使用建立好的 AST。

閱讀全文〈Traverse Clang AST: visitor〉

9 10 月, 20169 10 月, 2016

Build LLVM+Clang from source code with mingw

這邊不筆記一下不行，人老了沒辦法記這麼詳細了，發揮打遊戲查攻略的精神搜尋資料和嘗試錯誤!! 這也太難，光是編譯就可以讓一堆人打退堂鼓了…

我就是想在 win7 + mingw 環境下用 clang ! 官網都不提供只好自己來編譯了。

Reference

Requirement: 確定都有寫進 system path

mingw32 或 mingw32-w64
cmake
python 2 或 python 3 都可以
我的 OS: windows 7 64-bit，有人則是在 linux 系統中編譯 cross-platform target 比較好設定，網路上的步驟也幾乎都是 linux 系統指令

Downloads: 一堆 source code 還要放在特定的結構目錄下才能正常編譯

LLVM source code: 重新命名資料夾 llvm
Clang source code: 重新命名資料夾為 clang，放在 llvm/tools/
clang-tools-extra source code: (optional) 重新命名資料夾為 extra，放在 llvm/tools/clang/tools/, 這樣才有些其他 clang tool. eg. AST matcher
compiler-rt source code: (optional) 重新命名資料夾為 compiler-rt，放在資料夾 llvm/projects/
其他 source code 幾乎都是放在 llvm/projects/ 目錄下

Steps

執行 cmake-gui，設定好 source & build path
執行 configure，選 mingw build
找到 CMAKE_BUILD_TYPE，輸入值 Release 或 MinSizeRel，才可以編譯比較小的 LLVM，否則 lib大小很驚人
執行 generate，確定沒有錯誤訊息
cmd 進入 build 目錄下，執行 mingw32-make 編譯
同目錄執行 mingw32-make install 完成 install目錄 (官方文件提示要先 mingw32-make check-all，但 win7 環境下都失敗不管它了)
lib 目錄下要 include 的東西太多可以 archive 起來，之後要引入 linker 比較方便，這裡範例使用 thin archive，只打包檔名，並非真的打包所有 *.a。在 llvm/lib/ 目錄下輸入
```
ar -rcT libclang.a *.a
```
以後要 link clang 只要 link libclang.a 即可。若使用官網提供 VC++ 編譯好的 binary，則是可以在 link time 時將 libclang.lib 當成 object file 來 link 。

這樣編譯好的 clang.exe 基本上還是使用 mingw 的 standard library，需要 mingw 的 runtime 才能執行，使用 clang.exe 前確保 mingw 有設定進 system path

編譯好的懶人包

8 10 月, 20169 10 月, 2016

Clang AST dump

設計一個 main.cpp ，嘗試用 clang 把 AST 印出來，兩種指令都可以，如果有引入其他 lib 可能會 dump 出一堆東西(如 #include<time.h>)，畢竟 clang 預設指令不含 preprocessor，不會去認這是不是 standard library

#define NUM 10
int   REG_A;
float REG_B      = NUM;
int   REG_C[NUM] = {0};

extern void autog();

void autoGen(const float &in) {
	REG_A = NUM;
}

clang-check.exe -ast-dump main.cpp --

TranslationUnitDecl 0x351a40 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x351f90 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x351ca0 '__int128'
|-TypedefDecl 0x352000 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x351cc0 'unsigned __int128'
|-TypedefDecl 0x352098 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x352060 'char *'
|   `-BuiltinType 0x351ae0 'char'
|-TypedefDecl 0x352100 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'char *'
| `-PointerType 0x352060 'char *'
|   `-BuiltinType 0x351ae0 'char'
|-VarDecl 0x352170 <D:\workspace\clangparser\Release\main.cpp:2:1, col:7> col:7 used REG_A 'int'
|-VarDecl 0x352238 <line:3:1, line:1:13> line:3:7 REG_B 'float' cinit
| `-ImplicitCastExpr 0x3522c0 <line:1:13> 'float' <IntegralToFloating>
|   `-IntegerLiteral 0x3522a0 <col:13> 'int' 10
|-VarDecl 0x352360 <line:4:1, col:22> col:7 REG_C 'int [10]' cinit
| `-InitListExpr 0x352430 <col:20, col:22> 'int [10]'
|   |-array filler
|   | `-ImplicitValueInitExpr 0x352478 <<invalid sloc>> 'int'
|   `-IntegerLiteral 0x3523c8 <col:21> 'int' 0
|-FunctionDecl 0x3524e0 <line:6:1, col:19> col:13 autog 'void (void)' extern
`-FunctionDecl 0x3526a0 <line:8:1, line:10:1> line:8:6 autoGen 'void (const float &)'
  |-ParmVarDecl 0x3525d0 <col:14, col:27> col:27 in 'const float &'
  `-CompoundStmt 0x4d6e300 <col:31, line:10:1>
    `-BinaryOperator 0x4d6e2d8 <line:9:2, line:1:13> 'int' lvalue '='
      |-DeclRefExpr 0x4d6e290 <line:9:2> 'int' lvalue Var 0x352170 'REG_A' 'int'
      `-IntegerLiteral 0x4d6e2b8 <line:1:13> 'int' 10

clang -Xclang -ast-dump -fsyntax-only main.cpp

TranslationUnitDecl 0x37a6ed0 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x37a7420 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x37a7130 '__int128'
|-TypedefDecl 0x37a7490 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x37a7150 'unsigned __int128'
|-TypedefDecl 0x37a7528 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x37a74f0 'char *'
|   `-BuiltinType 0x37a6f70 'char'
|-TypedefDecl 0x37a7590 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'char *'
| `-PointerType 0x37a74f0 'char *'
|   `-BuiltinType 0x37a6f70 'char'
|-VarDecl 0x37a7600 <main.cpp:2:1, col:7> col:7 used REG_A 'int'
|-VarDecl 0x37a76c8 <line:3:1, line:1:13> line:3:7 REG_B 'float' cinit
| `-ImplicitCastExpr 0x37a7750 <line:1:13> 'float' <IntegralToFloating>
|   `-IntegerLiteral 0x37a7730 <col:13> 'int' 10
|-VarDecl 0x37a77f0 <line:4:1, col:22> col:7 REG_C 'int [10]' cinit
| `-InitListExpr 0x37a78c0 <col:20, col:22> 'int [10]'
|   |-array filler
|   | `-ImplicitValueInitExpr 0x37a7908 <<invalid sloc>> 'int'
|   `-IntegerLiteral 0x37a7858 <col:21> 'int' 0
|-FunctionDecl 0x37a7970 <line:6:1, col:19> col:13 autog 'void (void)' extern
`-FunctionDecl 0x37a7b30 <line:8:1, line:10:1> line:8:6 autoGen 'void (const float &)'
  |-ParmVarDecl 0x37a7a60 <col:14, col:27> col:27 in 'const float &'
  `-CompoundStmt 0x5362c40 <col:31, line:10:1>
    `-BinaryOperator 0x5362c18 <line:9:2, line:1:13> 'int' lvalue '='
      |-DeclRefExpr 0x5362bd0 <line:9:2> 'int' lvalue Var 0x37a7600 'REG_A' 'int'
      `-IntegerLiteral 0x5362bf8 <line:1:13> 'int' 10

25 9 月, 201625 9 月, 2016

Stereo Matching using OpenCL (2)

同上一篇，但是把所有 search range 平行化到每個 work-item，結論是沒有比較快…

閱讀全文〈Stereo Matching using OpenCL (2)〉

24 9 月, 20163 10 月, 2016

Stereo Matching using OpenCL

繼上一篇的OpenCL版本，速度快了不少，但是顯卡會 freeze ….修改 TDR Level 讓 driver可以跑久一點

KeyPath   : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
KeyValue  : TdrLevel
ValueType : REG_DWORD
ValueData : TdrLevelOff (0) - Detection disabled 
 TdrLevelBugcheck (1) - Bug check on detected timeout, for example, no recovery.
 TdrLevelRecoverVGA (2) - Recover to VGA (not implemented).
 TdrLevelRecover (3) - Recover on timeout. This is the default value.

code

閱讀全文〈Stereo Matching using OpenCL〉

6 8 月, 20167 8 月, 2016

OpenCL Box Filtering

一個很 tricky 的點跟 CPU 計算不一樣，這邊如果把 main.cpp 呼叫 kernel 的 devBy 全部換成 devBx，在 thread 數不超過顯卡 core 負荷時(也就是一個 round 可以跑完 kernel )，結果是不會變的。可以嘗試在大圖需要多個 round 才能跑完 kernel 時就會出錯。

其實因為所有 thread 同步 inplace filtering 寫入關係，當寫入任一點時其他點的 source 已經不會需要再用到，所以不會有 propagate 的問題

讀寫檔案使用 OpenCV，減少 memory access 次數所以 x 和 y 方向 filtering 分開

閱讀全文〈OpenCL Box Filtering〉

4 8 月, 20166 8 月, 2016