程式扎記: 11月 2019

2019年11月29日星期五

[Linux 常見問題] How to Change Linux User’s Password in One Command Line

Source From Here
Preface
In Linux, we use passwd to change password, but passwd requires input from stdin to get the new password. It is common case that we may want to change the password non-interactively, such as creating new user accounts and change or set password for these accounts on a number of Linux boxes when the new user creating can be done by one command line. With the help of pipe and a little tricky, we can change user’s password in one command line. This will save much time especially when creating a batch of user accounts.

How-To
We use one example to introduce how to change Linux user’s password in one command line. Suppose we login as root and want to change user linuxuser‘s password to linuxpassword.

The passwd command asks for the new password twice. And these two inputs (the same password) is separated by one “Enter”. We can emulate this by the echo command with the -e option set. When -e is in effect, escaped characters will be interpreted. Hence, \n in echo’s input is echoed as “new line”. In addition, on modern Linux with sufficiently new passwd, you can use the --stdin option to let passwd accept a password from the STDIN instead of asking for the new password twice.

So to change the password in our example, we just execute this one command:

# echo "linuxpassword" | passwd --stdin linuxuser

on modern Linux. (Thanks to DAVID for this tip)
or

# echo -e "linuxpassword\nlinuxpassword" | passwd linuxuser

This can also be put into one bash script or executed on remote node by the ssh command. For example, we can change the password of linuxuser on a batch of servers (100 servers: 10.1.0.1 to 10.1.0.100) by:

# for ((i=1;i<=100;i++)); do \
ssh 10.1.0.$i 'echo -e "linuxpassword\nlinuxpassword" | passwd linuxuser'; \
done;

Even further, we can create one user and set its initial password remotely by:

# ssh remoteserver \
'useradd newuser; echo -e "passwdofuser\npasswdofuser" | passwd newuser'

If you want to update your own password as a normal user, you may use

$ echo -e "your_current_pass\nlinuxpassword\nlinuxpassword" | passwd

Security notes
You must be aware that the full command line can be viewed by all users in the Linux system and the password in the command line can be potentially leased. Only for cases where this is okay, you may consider using the method here.

Alternative method using chpasswd
chpasswd is a nice tool to change a batch of accounts’ passwords in one Linux box. It can be used to change a user’s password in one command line too. Check its manual for how to use it.

2019年11月28日星期四

[ Python 文章收集 ] Python3：找回 sort() 中消失的 cmp 參數

Source From Here
問題描述
當小伙伴們愉快地用 Python2 中 list 的 sort() 方法的 cmp 參數排序時：

view plaincopy to clipboardprint?
nums = [1, 3, 2, 4]  
nums.sort(cmp=lambda a, b: a - b)  
print(nums)  # [1, 2, 3, 4]  

卻發現在 Python3 下竟然報錯了：

view plaincopy to clipboardprint?
Traceback (most recent call last):  
  File "temp.py", line 2, in   
    nums.sort(cmp=lambda a, b: a - b)  
TypeError: 'cmp' is an invalid keyword argument for this function  

找不到 cmp 參數？WTF？還怎麼愉快玩耍？

原因分析
馬上去看官方文檔，Python2 下的 sort()：

view plaincopy to clipboardprint?
sort(cmp=None, key=None, reverse=False)  

再看看Python3下：

view plaincopy to clipboardprint?
sort(*, key=None, reverse=None)  

好取消 Python3 下真的把 cmp 參數給取消掉了。原因是為了簡化和統一Python語言，於是就把 sort() 方法中的 cmp 參數給取消掉了.

解決方法
為了照顧到廣大 cmp 用戶的心情，Python3 還是留了一條活路的，用就是 functools.cmp_to_key() 來曲線救國啦，廢話不多說，直接上代碼：

>>> from functools import cmp_to_key
>>> nums = [1, 3, 2, 4]
>>> nums.sort(key=cmp_to_key(lambda a, b: a - b))
>>> nums
[1, 2, 3, 4]

>>> nums.sort(key=cmp_to_key(lambda a, b: b - a))
>>> nums
[4, 3, 2, 1]

如果是使用 sorted 函數的話, 則可以如下:

>>> nums = [1, 3, 2, 4]
>>> sorted(nums, key=lambda e: e % 2)
[2, 4, 1, 3]

>>> sorted(nums, key=lambda e: 1 - (e % 2))
[1, 3, 2, 4]

Reference
* Data Structures; Python 2.7.13 documentation
* Built-in Types; Python 3.6.1 documentation
* Sorting HOW TO; Python 3.6.1 documentation
* functools — Higher-order functions and operations ...ts; Python 3.6.1 documentation

2019年11月23日星期六

[Py DS] Ch5 - Machine Learning (Part5)

In Depth: Naive Bayes Classification
The previous four sections have given a general overview of the concepts of machine learning. In this section and the ones that follow, we will be taking a closer look at several specific algorithms for supervised and unsupervised learning, starting here with naive Bayes classification.

Naive Bayes models are a group of extremely fast and simple classification algorithms that are often suitable for very high-dimensional datasets. Because they are so fast and have so few tunable parameters, they end up being very useful as a quick-anddirty baseline for a classification problem. This section will focus on an intuitive explanation of how naive Bayes classifiers work, followed by a couple examples of them in action on some datasets.

Bayesian Classification
Naive Bayes classifiers are built on Bayesian classification methods. These rely on Bayes’s theorem, which is an equation describing the relationship of conditional probabilities of statistical quantities. In Bayesian classification, we’re interested in finding the probability of a label given some observed features, which we can write as P(L|features) . Bayes’s theorem tells us how to express this in terms of quantities we can compute more directly:

All we need now is some model by which we can compute P(features|Li) for each label. Such a model is called a generative model because it specifies the hypothetical random process that generates the data. Specifying this generative model for each label is the main piece of the training of such a Bayesian classifier. The general version of such a training step is a very difficult task, but we can make it simpler through the use of some simplifying assumptions about the form of this model.

This is where the “naive” in “naive Bayes” comes in: if we make very naive assumptions about the generative model for each label, we can find a rough approximation of the generative model for each class, and then proceed with the Bayesian classification. Different types of naive Bayes classifiers rest on different naive assumptions about the data, and we will examine a few of these in the following sections. We begin with the standard imports:

view plaincopy to clipboardprint?
%matplotlib inline  
import numpy as np  
import matplotlib.pyplot as plt  
import seaborn as sns; sns.set()  

Gaussian Naive Bayes
Perhaps the easiest naive Bayes classifier to understand is Gaussian naive Bayes. In this classifier, the assumption is that data from each label is drawn from a simple Gaussian distribution. Imagine that you have the following data (Figure 5-38):

view plaincopy to clipboardprint?
from sklearn.datasets import make_blobs  
X, y = make_blobs(100, 2, centers=2, random_state=2, cluster_std=1.5)  
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='RdBu');  

Figure 5-38. Data for Gaussian naive Bayes classification

One extremely fast way to create a simple model is to assume that the data is described by a Gaussian distribution with no covariance between dimensions. We can fit this model by simply finding the mean and standard deviation of the points within each label, which is all you need to define such a distribution. The result of this naive Gaussian assumption is shown in Figure 5-39.

Figure 5-39. Visualization of the Gaussian naive Bayes model

The ellipses here represent the Gaussian generative model for each label, with larger probability toward the center of the ellipses. With this generative model in place for each class, we have a simple recipe to compute the likelihood P(features | L1) for any data point, and thus we can quickly compute the posterior ratio and determine which label is the most probable for a given point.

This procedure is implemented in Scikit-Learn’s sklearn.naive_bayes.GaussianNB estimator:

view plaincopy to clipboardprint?
from sklearn.naive_bayes import GaussianNB  
  
model = GaussianNB()  
model.fit(X, y);  

Now let’s generate some new data and predict the label:

view plaincopy to clipboardprint?
rng = np.random.RandomState(0)  
Xnew = [-6, -14] + [14, 18] * rng.rand(2000, 2)  
ynew = model.predict(Xnew)  

Now we can plot this new data to get an idea of where the decision boundary is (Figure 5-40):

view plaincopy to clipboardprint?
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='RdBu')  
lim = plt.axis()  
plt.scatter(Xnew[:, 0], Xnew[:, 1], c=ynew, s=20, cmap='RdBu', alpha=0.1)  
plt.axis(lim);  

Figure 5-40. Visualization of the Gaussian naive Bayes classification

We see a slightly curved boundary in the classifications—in general, the boundary in Gaussian naive Bayes is quadratic. A nice piece of this Bayesian formalism is that it naturally allows for probabilistic classification, which we can compute using the predict_proba method:

view plaincopy to clipboardprint?
yprob = model.predict_proba(Xnew)  
yprob[-8:].round(2)  

Output:

view plaincopy to clipboardprint?
array([[0.89, 0.11],  
       [1.  , 0.  ],  
       [1.  , 0.  ],  
       [1.  , 0.  ],  
       [1.  , 0.  ],  
       [1.  , 0.  ],  
       [0.  , 1.  ],  
       [0.15, 0.85]])  

The columns give the posterior probabilities of the first and second label, respectively. If you are looking for estimates of uncertainty in your classification, Bayesian approaches like this can be a useful approach. Of course, the final classification will only be as good as the model assumptions that lead to it, which is why Gaussian naive Bayes often does not produce very good results. Still, in many cases—especially as the number of features becomes large—this assumption is not detrimental enough to prevent Gaussian naive Bayes from being a useful method.

Multinomial Naive Bayes
The Gaussian assumption just described is by no means the only simple assumption that could be used to specify the generative distribution for each label. Another useful example is multinomial naive Bayes, where the features are assumed to be generated from a simple multinomial distribution. The multinomial distribution describes the probability of observing counts among a number of categories, and thus multinomial naive Bayes is most appropriate for features that represent counts or count rates.

The idea is precisely the same as before, except that instead of modeling the data distribution with the best-fit Gaussian, we model the data distribution with a best-fit multinomial distribution.

Example: Classifying text
the features are related to word counts or frequencies within the documents to be classified. We discussed the extraction of such features from text in “Feature Engineering” on page 375; here we will use the sparse word count features from the 20 Newsgroups corpus to show how we might classify these short documents into categories.

Let’s download the data and take a look at the target names:

view plaincopy to clipboardprint?
from sklearn.datasets import fetch_20newsgroups  
data = fetch_20newsgroups()  
data.target_names  

For simplicity, we will select just a few of these categories, and download the training and testing set:

view plaincopy to clipboardprint?
categories = ['talk.religion.misc', 'soc.religion.christian', 'sci.space', 'comp.graphics']  
train = fetch_20newsgroups(subset='train', categories=categories)  
test = fetch_20newsgroups(subset='test', categories=categories)  

Here is a representative entry from the data:

In order to use this data for machine learning, we need to be able to convert the content of each string into a vector of numbers. For this we will use the TF–IDF vectorizer (discussed in “Feature Engineering” on page 375), and create a pipeline that attaches it to a multinomial naive Bayes classifier:

view plaincopy to clipboardprint?
from sklearn.feature_extraction.text import TfidfVectorizer  
from sklearn.naive_bayes import MultinomialNB  
from sklearn.pipeline import make_pipeline  
model = make_pipeline(TfidfVectorizer(), MultinomialNB())  

With this pipeline, we can apply the model to the training data, and predict labels for the test data:

view plaincopy to clipboardprint?
model.fit(train.data, train.target)  
labels = model.predict(test.data)  

Now that we have predicted the labels for the test data, we can evaluate them to learn about the performance of the estimator. For example, here is the confusion matrix between the true and predicted labels for the test data (Figure 5-41):

view plaincopy to clipboardprint?
from sklearn.metrics import confusion_matrix  
mat = confusion_matrix(test.target, labels)  
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,  
            xticklabels=train.target_names, yticklabels=train.target_names)  
  
plt.xlabel('true label')  
plt.ylabel('predicted label');  

Figure 5-41. Confusion matrix for the multinomial naive Bayes text classifier

Evidently, even this very simple classifier can successfully separate space talk from computer talk, but it gets confused between talk about religion and talk about Christianity. This is perhaps an expected area of confusion! The very cool thing here is that we now have the tools to determine the category for any string, using the predict() method of this pipeline. Here’s a quick utility function that will return the prediction for a single string:

view plaincopy to clipboardprint?
def predict_category(s, train=train, model=model):  
    pred = model.predict([s])  
    return train.target_names[pred[0]]  

Let’s try it out:

view plaincopy to clipboardprint?
for s in [  
            'sending a payload to the ISS',  
            'discussing islam vs atheism',  
            'determining the screen resolution'  
         ]:  
    cate = predict_category(s)  
    print("'{}' is predicated as cate={}".format(s, cate))  

Output:

'sending a payload to the ISS' is predicated as cate=sci.space
'discussing islam vs atheism' is predicated as cate=soc.religion.christian
'determining the screen resolution' is predicated as cate=comp.graphics

Remember that this is nothing more sophisticated than a simple probability model for the (weighted) frequency of each word in the string; nevertheless, the result is striking. Even a very naive algorithm, when used carefully and trained on a large set of high-dimensional data, can be surprisingly effective.

When to Use Naive Bayes
Because naive Bayesian classifiers make such stringent assumptions about data, they will generally not perform as well as a more complicated model. That said, they have several advantages:

• They are extremely fast for both training and prediction
• They provide straightforward probabilistic prediction
• They are often very easily interpretable
• They have very few (if any) tunable parameters

These advantages mean a Naive Bayesian classifier is often a good choice as an initial baseline classification. If it performs suitably, then congratulations: you have a very fast, very interpretable classifier for your problem. If it does not perform well, then you can begin exploring more sophisticated models, with some baseline knowledge of how well they should perform.

Naive Bayes classifiers tend to perform especially well in one of the following situations:

• When the naive assumptions actually match the data (very rare in practice)
• For very well-separated categories, when model complexity is less important
• For very high-dimensional data, when model complexity is less important

The last two points seem distinct, but they actually are related: as the dimension of a dataset grows, it is much less likely for any two points to be found close together (after all, they must be close in every single dimension to be close overall). This means that clusters in high dimensions tend to be more separated, on average, than clusters in low dimensions, assuming the new dimensions actually add information. For this reason, simplistic classifiers like naive Bayes tend to work as well or better than more complicated classifiers as the dimensionality grows: once you have enough data, even a simple model can be very powerful.

訂閱：文章 (Atom)

程式扎記

標籤

2019年11月29日星期五

[Linux 常見問題] How to Change Linux User’s Password in One Command Line

2019年11月28日星期四

[ Python 文章收集 ] Python3：找回 sort() 中消失的 cmp 參數

2019年11月23日星期六

[Py DS] Ch5 - Machine Learning (Part5)

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2019年11月29日 星期五

[Linux 常見問題] How to Change Linux User’s Password in One Command Line

2019年11月28日 星期四

[ Python 文章收集 ] Python3：找回 sort() 中消失的 cmp 參數

2019年11月23日 星期六

[Py DS] Ch5 - Machine Learning (Part5)

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2019年11月29日星期五

2019年11月28日星期四

2019年11月23日星期六