2020年10月31日 星期六

[ Python 常見問題 ] How do I insert a column at a specific column index in pandas?

 Source From Here

Question
Can I insert a column at a specific column index in pandas?
  1. import pandas as pd  
  2. df = pd.DataFrame({'l':['a','b','c','d'], 'v':[1,2,1,2]})  
  3. df['n'] = 0  

This will put column n as the last column of df, but isn't there a way to tell df to put n at the beginning?

HowTo
see docs: http://pandas.pydata.org/pandas-docs/stable/genera...d/pandas.DataFrame.insert.html

using loc = 0 will insert at the beginning:
  1. df.insert(loc, column, value)  
For example:
  1. df = pd.DataFrame({'B': [123], 'C': [456]})  
  2.   
  3. df  
  4. Out:   
  5.    B  C  
  6. 0  1  4  
  7. 1  2  5  
  8. 2  3  6  
  9.   
  10. idx = 0  
  11. new_col = [789]  # can be a list, a Series, an array or a scalar     
  12. df.insert(loc=idx, column='A', value=new_col)  
  13.   
  14. df  
  15. Out:   
  16.    A  B  C  
  17. 0  7  1  4  
  18. 1  8  2  5  
  19. 2  9  3  6  

2020年10月23日 星期五

[Linux 常見問題] Getting the count of unique values in a column in bash

 Source From Here

Question
I have tab delimited files with several columns. I want to count the frequency of occurrence of the different values in a column for all the files in a folder and sort them in decreasing order of count (highest count first). How would I accomplish this in a Linux command line environment?

It can use any common command line language like awk, perl, python etc.

HowTo
To see a frequency count for column two (for example):
# awk -F '\t' '{print $2}' * | sort | uniq -c | sort -nr

fileA.txt
  1. z    z    a  
  2. a    b    c  
  3. w    d    e  
fileB.txt
  1. t    r    e  
  2. z    d    a  
  3. a    g    c  

fileC.txt
  1. z    r    a  
  2. v    d    c  
  3. a    m    c  
Result:
  1. 3 d  
  2. 2 r  
  3. 1 z  
  4. 1 m  
  5. 1 g  
  6. 1 b  


2020年10月16日 星期五

[ 常見問題 ] How to set GOPRIVATE environment variable

 Source From Here

Question
I started working on a Go project and it uses some private modules from Github private repos and whenever I try to run go run main.go it gives me a below 410 Gone error:
verifying github.com/repoURL/go-proto@v2.86.0+incompatible/go.mod: github.com/repoURL/go-proto@v2.86.0+incompatible/go.mod: reading https://sum.golang.org/lookup/github.com/!repoURL/go-proto@v2.86.0+incompatible: 410 Gone

I can easily clone private repo from terminal which means my ssh keys are configured correctly. I read here that I need to set GOPRIVATE environment variable but I am not sure how to do that.

HowTo

Short Answer:
# go env -w GOPRIVATE=github.com/repoURL/private-repo

OR If you want to allow all private repos from your organization
# go env -w GOPRIVATE=github.com/<OrgNameHere>/*

Long Answer:
Check "Module configuration for non-public modules" for more information:
The GOPRIVATE environment variable controls which modules the go command considers to be private (not available publicly) and should therefore not use the proxy or checksum database. The variable is a comma-separated list of glob patterns (in the syntax of Go's path.Match) of module path prefixes. For example,
  1. GOPRIVATE=*.corp.example.com,rsc.io/private  
causes the go command to treat as private any module with a path prefix matching either pattern, including git.corp.example.com/xyzzy, rsc.io/private, and rsc.io/private/quux.

The 'go env -w' command (see 'go help env') can be used to set these variables for future go command invocations.

Note on the usage of ssh:
If you use ssh to access git repo (locally hosted), you might want to add the following to your ~/.gitconfig:
  1. [url "ssh://git@git.local.intranet/"]  
  2.        insteadOf = https://git.local.intranet/  
for the go commands to be able to access the git server.

[ 文章收集 ] Go by Example: Signals

 Source From Here

Preface
Sometimes we’d like our Go programs to intelligently handle Unix signals. For example, we might want a server to gracefully shutdown when it receives a SIGTERM, or a command-line tool to stop processing input if it receives a SIGINT. Here’s how to handle signals in Go with channels.

HowTo
  1. package main  
  2.   
  3. import (  
  4.     "fmt"  
  5.     "os"  
  6.     "os/signal"  
  7.     "syscall"  
  8. )  
  9.   
  10. func main() {  
  11.   
  12.     sigs := make(chan os.Signal, 1)  
  13.     done := make(chan bool, 1)  
  14.   
  15.     signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)  
  16.   
  17.     go func() {  
  18.         sig := <-sigs  
  19.         fmt.Println()  
  20.         fmt.Println(sig)  
  21.         done <- true  
  22.     }()  
  23.   
  24.     fmt.Println("awaiting signal")  
  25.     <-done  
  26.     fmt.Println("exiting")  
  27. }  
Go signal notification works by sending os.Signal values on a channel. We’ll create a channel to receive these notifications (we’ll also make one to notify us when the program can exit).

* Line15
signal.Notify registers the given channel to receive notifications of the specified signals.

* Line17-21
This goroutine executes a blocking receive for signals. When it gets one it’ll print it out and then notify the program that it can finish.

* Line24-26
The program will wait here until it gets the expected signal (as indicated by the goroutine above sending a value on done) and then exit.


When we run this program it will block waiting for a signal. By typing ctrl-C (which the terminal shows as ^C) we can send a SIGINT signal, causing the program to print interrupt and then exit:
$ go run signals.go
awaiting signal
^C
interrupt
exiting


[ Python 常見問題 ] Unable to allocate array with shape and data type

 Source From Here

Question
I'm facing an issue with allocating huge arrays in numpy on Ubuntu 18 while not facing the same issue on MacOS. I am trying to allocate memory for a numpy array with shape (156816, 36, 53806) with:
  1. np.zeros((1568163653806), dtype='uint8')  
and while I'm getting an error on Ubuntu OS
>>> import numpy as np
>>> np.zeros((156816, 36, 53806), dtype='uint8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
numpy.core._exceptions.MemoryError: Unable to allocate array with shape (156816, 36, 53806)

I've read somewhere that np.zeros shouldn't be really allocating the whole memory needed for the array, but only for the non-zero elements. Even though the Ubuntu machine has 64gb of memory, while my MacBook Pro has only 16gb.

HowTo
This is likely due to your system's overcommit handling mode.

In the default mode, 0,
Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slightly more memory in this mode. This is the default.


The exact heuristic used is not well explained here, but this is discussed more on Linux over commit heuristic and on this page.

You can check your current overcommit mode by running
$ cat /proc/sys/vm/overcommit_memory
0

In this case you're allocating
>>> 156816 * 36 * 53806 / 1024.0**3
282.8939827680588

~282 GB, and the kernel is saying well obviously there's no way I'm going to be able to commit that many physical pages to this, and it refuses the allocation.

If (as root) you run:
$ echo 1 > /proc/sys/vm/overcommit_memory

This will enable "always overcommit" mode, and you'll find that indeed the system will allow you to make the allocation no matter how large it is (within 64-bit memory addressing at least).

[ Python 常見問題 ] When using unittest.mock.patch, why is autospec not True by default?

  Source From  Here Question When you patch a function using  mock , you have the option to specify  autospec  as True: If you set  autospec...