2020年10月23日 星期五

[Linux 常見問題] Getting the count of unique values in a column in bash

 Source From Here

Question
I have tab delimited files with several columns. I want to count the frequency of occurrence of the different values in a column for all the files in a folder and sort them in decreasing order of count (highest count first). How would I accomplish this in a Linux command line environment?

It can use any common command line language like awk, perl, python etc.

HowTo
To see a frequency count for column two (for example):
# awk -F '\t' '{print $2}' * | sort | uniq -c | sort -nr

fileA.txt
  1. z    z    a  
  2. a    b    c  
  3. w    d    e  
fileB.txt
  1. t    r    e  
  2. z    d    a  
  3. a    g    c  

fileC.txt
  1. z    r    a  
  2. v    d    c  
  3. a    m    c  
Result:
  1. 3 d  
  2. 2 r  
  3. 1 z  
  4. 1 m  
  5. 1 g  
  6. 1 b  


2020年10月16日 星期五

[ 常見問題 ] How to set GOPRIVATE environment variable

 Source From Here

Question
I started working on a Go project and it uses some private modules from Github private repos and whenever I try to run go run main.go it gives me a below 410 Gone error:
verifying github.com/repoURL/go-proto@v2.86.0+incompatible/go.mod: github.com/repoURL/go-proto@v2.86.0+incompatible/go.mod: reading https://sum.golang.org/lookup/github.com/!repoURL/go-proto@v2.86.0+incompatible: 410 Gone

I can easily clone private repo from terminal which means my ssh keys are configured correctly. I read here that I need to set GOPRIVATE environment variable but I am not sure how to do that.

HowTo

Short Answer:
# go env -w GOPRIVATE=github.com/repoURL/private-repo

OR If you want to allow all private repos from your organization
# go env -w GOPRIVATE=github.com/<OrgNameHere>/*

Long Answer:
Check "Module configuration for non-public modules" for more information:
The GOPRIVATE environment variable controls which modules the go command considers to be private (not available publicly) and should therefore not use the proxy or checksum database. The variable is a comma-separated list of glob patterns (in the syntax of Go's path.Match) of module path prefixes. For example,
  1. GOPRIVATE=*.corp.example.com,rsc.io/private  
causes the go command to treat as private any module with a path prefix matching either pattern, including git.corp.example.com/xyzzy, rsc.io/private, and rsc.io/private/quux.

The 'go env -w' command (see 'go help env') can be used to set these variables for future go command invocations.

Note on the usage of ssh:
If you use ssh to access git repo (locally hosted), you might want to add the following to your ~/.gitconfig:
  1. [url "ssh://git@git.local.intranet/"]  
  2.        insteadOf = https://git.local.intranet/  
for the go commands to be able to access the git server.

[ 文章收集 ] Go by Example: Signals

 Source From Here

Preface
Sometimes we’d like our Go programs to intelligently handle Unix signals. For example, we might want a server to gracefully shutdown when it receives a SIGTERM, or a command-line tool to stop processing input if it receives a SIGINT. Here’s how to handle signals in Go with channels.

HowTo
  1. package main  
  2.   
  3. import (  
  4.     "fmt"  
  5.     "os"  
  6.     "os/signal"  
  7.     "syscall"  
  8. )  
  9.   
  10. func main() {  
  11.   
  12.     sigs := make(chan os.Signal, 1)  
  13.     done := make(chan bool, 1)  
  14.   
  15.     signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)  
  16.   
  17.     go func() {  
  18.         sig := <-sigs  
  19.         fmt.Println()  
  20.         fmt.Println(sig)  
  21.         done <- true  
  22.     }()  
  23.   
  24.     fmt.Println("awaiting signal")  
  25.     <-done  
  26.     fmt.Println("exiting")  
  27. }  
Go signal notification works by sending os.Signal values on a channel. We’ll create a channel to receive these notifications (we’ll also make one to notify us when the program can exit).

* Line15
signal.Notify registers the given channel to receive notifications of the specified signals.

* Line17-21
This goroutine executes a blocking receive for signals. When it gets one it’ll print it out and then notify the program that it can finish.

* Line24-26
The program will wait here until it gets the expected signal (as indicated by the goroutine above sending a value on done) and then exit.


When we run this program it will block waiting for a signal. By typing ctrl-C (which the terminal shows as ^C) we can send a SIGINT signal, causing the program to print interrupt and then exit:
$ go run signals.go
awaiting signal
^C
interrupt
exiting


[ Python 常見問題 ] Unable to allocate array with shape and data type

 Source From Here

Question
I'm facing an issue with allocating huge arrays in numpy on Ubuntu 18 while not facing the same issue on MacOS. I am trying to allocate memory for a numpy array with shape (156816, 36, 53806) with:
  1. np.zeros((1568163653806), dtype='uint8')  
and while I'm getting an error on Ubuntu OS
>>> import numpy as np
>>> np.zeros((156816, 36, 53806), dtype='uint8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
numpy.core._exceptions.MemoryError: Unable to allocate array with shape (156816, 36, 53806)

I've read somewhere that np.zeros shouldn't be really allocating the whole memory needed for the array, but only for the non-zero elements. Even though the Ubuntu machine has 64gb of memory, while my MacBook Pro has only 16gb.

HowTo
This is likely due to your system's overcommit handling mode.

In the default mode, 0,
Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slightly more memory in this mode. This is the default.


The exact heuristic used is not well explained here, but this is discussed more on Linux over commit heuristic and on this page.

You can check your current overcommit mode by running
$ cat /proc/sys/vm/overcommit_memory
0

In this case you're allocating
>>> 156816 * 36 * 53806 / 1024.0**3
282.8939827680588

~282 GB, and the kernel is saying well obviously there's no way I'm going to be able to commit that many physical pages to this, and it refuses the allocation.

If (as root) you run:
$ echo 1 > /proc/sys/vm/overcommit_memory

This will enable "always overcommit" mode, and you'll find that indeed the system will allow you to make the allocation no matter how large it is (within 64-bit memory addressing at least).

2020年10月14日 星期三

[ Python 常見問題 ] How to tell if tensorflow is using gpu acceleration from inside python shell?

 Source From Here

Question
I have installed tensorflow in my ubuntu 16.04 using the second answer here with ubuntu's builtin apt cuda installation.

Now my question is how can I test if tensorflow is really using gpu? I have a gtx 960m gpu. When I import tensorflow this is the output:
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

Is this output enough to check if tensorflow is using gpu ?

HowTo
To find out which device is used, you can enable log device placement like this:
>>> import tensorflow as tf
>>> sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))
...
2020-10-14 12:06:45.480226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-10-14 12:06:45.480280: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-10-14 12:06:45.481261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-14 12:06:45.481291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-10-14 12:06:45.481301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-10-14 12:06:45.481455: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 12:06:45.481841: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 12:06:45.482189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 293 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
...


[ Python 常見問題 ] Can Keras with Tensorflow backend be forced to use CPU or GPU at will?

 Source From Here

Question
I have Keras installed with the Tensorflow backend and CUDA. I'd like to sometimes on demand force Keras to use CPU. Can this be done without say installing a separate CPU-only Tensorflow in a virtual environment? If so how? If the backend were Theano, the flags could be set, but I have not heard of Tensorflow flags accessible via Keras.

HowTo
If you want to force Keras to use CPU:

Way1
  1. import os  
  2. os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   # see issue #152  
  3. os.environ["CUDA_VISIBLE_DEVICES"] = ""  
before Keras / Tensorflow is imported.

Way 2
Run your script as
$ CUDA_VISIBLE_DEVICES="" ./your_keras_code.py

See also
https://github.com/keras-team/keras/issues/152
https://github.com/fchollet/keras/issues/4613

2020年10月13日 星期二

[ 常見問題 ] Substring: How to Split a String

 Source From Here

Preface
In the example below we are looking at how to take the first x number of characters from the start of a string. If we know a character we want to separate on, like a space, we can use strings.Split() instead. But for this we’re looking to get the first 6 characters as a new string.

HowTo
To do this we first convert it into a rune, allowing for better support in different languages and allowing us to use it like an array. Then we pick the first characters using [0:6] and converting it back to a string.

Split Based on Position:
  1. package main  
  2.   
  3. import (  
  4.     "fmt"  
  5. )  
  6.   
  7. func main() {  
  8.   
  9.     myString := "Hello! This is a golangcode.com test ;)"  
  10.   
  11.     // Step 1: Convert it to a rune  
  12.     a := []rune(myString)  
  13.   
  14.     // Step 2: Grab the num of chars you need  
  15.     myShortString := string(a[0:6])  
  16.   
  17.     fmt.Println(myShortString)  
  18. }  
Output:
Hello!

Split Based on Character:
The alternative way, using the strings package would be:
  1. package main  
  2.   
  3. import (  
  4.     "fmt"  
  5.     "strings"  
  6. )  
  7.   
  8. func main() {  
  9.   
  10.     myString := "Hello! This is a golangcode.com test ;)"  
  11.   
  12.     strParts := strings.Split(myString, "!")  
  13.   
  14.     fmt.Println(strParts[0])  
  15. }  
Output:
Hello

Supplement
The Go Blog - Strings, bytes, runes and characters in Go
The Go language defines the word rune as an alias for the type int32, so programs can be clear when an integer value represents a code point. Moreover, what you might think of as a character constant is called a rune constant in Go...


[Linux 常見問題] Getting the count of unique values in a column in bash

  Source From  Here Question I have tab delimited files with several columns. I  want to count the frequency of occurrence of the different ...