2018年3月30日 星期五

[Linux 常見問題] Limit ssh access by IP address

Source From Here 
To limit ssh access to a linux box based on originating IP address, edit /etc/hosts.allow
  1. sshd : localhost : allow  
  2. sshd : 192.168.0. : allow  
  3. sshd : 99.151.250.7 : allow  
  4. sshd : mydomain.net : allow  
  5. sshd : ALL : deny  
The above entry will allow ssh access from localhost, the 192.168.0.x subnet, the single IP address 99.151.250.7, and mydomain.net (assuming mydomain.net has a ptr record in place to facilitate reverse lookup). All other IP addresses will be denied access to sshd. 

Notes: 
* You can allow or deny based on ip address, subnet, or hostname.
* List rules in order of most to least specific. The file only gets read until a matching line is found, so if you start with ssdh : ALL : deny, no ssh connections will be allowed.
* You can control access to other tcp wrapped services as well - see the hosts.allow man page for details: http://linux.die.net/man/5/hosts.allow

[Linux 文章收集] How to change visudo editor from nano to vim? (Ubuntu)

Source From Here 
Question 
When I use visudo, it always opens it with nano editor. How to change the editor to vim

How-To 
Type sudo update-alternatives --config editor 

You will get a text like below. 
  1. There are 4 choices for the alternative editor (providing /usr/bin/editor).  
  2.   
  3.   Selection    Path                Priority   Status  
  4. ------------------------------------------------------------  
  5. 0            /bin/nano            40        auto mode  
  6.   1            /bin/ed             -100       manual mode  
  7.   2            /bin/nano            40        manual mode  
  8.   3            /usr/bin/vim.basic   30        manual mode  
  9.   4            /usr/bin/vim.tiny    10        manual mode  
  10.   
  11. Press enter to keep the current choice[*], or type selection number: 3  
Find vim.basic or vim.tiny selection number. Type it and press enter. Next time when you open visudo your editor will be vim

[ ML 文章收集 ] Visualizing K-Means Clusters in Jupyter Notebooks

Source From Here 
Preface 
The information technology industry is in the middle of a powerful trend towards machine learning and artificial intelligence. These are difficult skills to master but if you embrace them and just do it, you’ll be making a very significant step towards advancing your career. As with any learning curve, it’s useful to start simple. The K-Means clustering algorithm is pretty intuitive and easy to understand, so in this post I’m going to describe what K-Means does and show you how to experiment with it using Spark and Python, and visualize its results in a Jupyter notebook


What is K-Means? 
k-means clustering aims to group a set of objects in such a way that objects in the same group (or clusterare more similar to each other than to those in other groups (clusters). It operates on a table of values where every cell is a number. K-Means only supports numeric columns. In Spark those tables are usually expressed as a dataframe. A dataframe with two columns can be easily visualized on a graph where the x-axis is the first column and the y-axis is the second column. For example, here’s a 2 dimensional graph for a dataframe with two columns. 
 

If you were to manually group the data in the above graph, how would you do it? You might draw two circles, like this: 
 

And in this case that is pretty close to what you get through k-means. The following figure shows how the data is segmented by running k-means on our two dimensional dataset. 
 

Charting feature columns like that can help you make intuitive sense of how k-means is segmenting your data. 

Visualizing K-Means Clusters in 3D 
The above plots were created by clustering two feature columns. There could have been other columns in our data set, but we just used two columns. If we want to use an additional column as a clustering feature we would want to visualize the cluster over three dimensions. Here’s an example that shows how to visualize cluster shapes with a 3D scatter/mesh plot in a Jupyter notebook (kmeansVisualize.ipynb): 
  1. # Initialize plotting library and functions for 3D scatter plots   
  2. from sklearn.datasets import make_blobs  
  3. from sklearn.datasets import make_gaussian_quantiles  
  4. from sklearn.datasets import make_classification, make_regression  
  5. from sklearn.externals import six  
  6. import pandas as pd  
  7. import numpy as np  
  8. import argparse  
  9. import json  
  10. import re  
  11. import os  
  12. import sys  
  13. import plotly  
  14. import plotly.graph_objs as go  
  15. plotly.offline.init_notebook_mode()  
  16.   
  17. def rename_columns(df, prefix='x'):  
  18.     """  
  19.     Rename the columns of a dataframe to have X in front of them  
  20.   
  21.     :param df: data frame we're operating on  
  22.     :param prefix: the prefix string  
  23.     """  
  24.     df = df.copy()  
  25.     df.columns = [prefix + str(i) for i in df.columns]  
  26.     return df  
  27.   
  28. # Create an artificial dataset with 3 clusters for 3 feature columns  
  29. X, Y = make_classification(n_samples=100, n_classes=3, n_features=3, n_redundant=0, n_informative=3,  
  30.                              scale=1000, n_clusters_per_class=1)  
  31. df = pd.DataFrame(X)  
  32. # rename X columns  
  33. df = rename_columns(df)  
  34. # and add the Y  
  35. df['y'] = Y  
  36. df.head(3)  
  37.   
  38. # Visualize cluster shapes in 3d.  
  39.   
  40. cluster1=df.loc[df['y'] == 0]  
  41. cluster2=df.loc[df['y'] == 1]  
  42. cluster3=df.loc[df['y'] == 2]  
  43.   
  44. scatter1 = dict(  
  45.     mode = "markers",  
  46.     name = "Cluster 1",  
  47.     type = "scatter3d",      
  48.     x = cluster1.as_matrix()[:,0], y = cluster1.as_matrix()[:,1], z = cluster1.as_matrix()[:,2],  
  49.     marker = dict( size=2, color='green')  
  50. )  
  51. scatter2 = dict(  
  52.     mode = "markers",  
  53.     name = "Cluster 2",  
  54.     type = "scatter3d",      
  55.     x = cluster2.as_matrix()[:,0], y = cluster2.as_matrix()[:,1], z = cluster2.as_matrix()[:,2],  
  56.     marker = dict( size=2, color='blue')  
  57. )  
  58. scatter3 = dict(  
  59.     mode = "markers",  
  60.     name = "Cluster 3",  
  61.     type = "scatter3d",      
  62.     x = cluster3.as_matrix()[:,0], y = cluster3.as_matrix()[:,1], z = cluster3.as_matrix()[:,2],  
  63.     marker = dict( size=2, color='red')  
  64. )  
  65. cluster1 = dict(  
  66.     alphahull = 5,  
  67.     name = "Cluster 1",  
  68.     opacity = .1,  
  69.     type = "mesh3d",      
  70.     x = cluster1.as_matrix()[:,0], y = cluster1.as_matrix()[:,1], z = cluster1.as_matrix()[:,2],  
  71.     color='green', showscale = True  
  72. )  
  73. cluster2 = dict(  
  74.     alphahull = 5,  
  75.     name = "Cluster 2",  
  76.     opacity = .1,  
  77.     type = "mesh3d",      
  78.     x = cluster2.as_matrix()[:,0], y = cluster2.as_matrix()[:,1], z = cluster2.as_matrix()[:,2],  
  79.     color='blue', showscale = True  
  80. )  
  81. cluster3 = dict(  
  82.     alphahull = 5,  
  83.     name = "Cluster 3",  
  84.     opacity = .1,  
  85.     type = "mesh3d",      
  86.     x = cluster3.as_matrix()[:,0], y = cluster3.as_matrix()[:,1], z = cluster3.as_matrix()[:,2],  
  87.     color='red', showscale = True  
  88. )  
  89. layout = dict(  
  90.     title = 'Interactive Cluster Shapes in 3D',  
  91.     scene = dict(  
  92.         xaxis = dict( zeroline=True ),  
  93.         yaxis = dict( zeroline=True ),  
  94.         zaxis = dict( zeroline=True ),  
  95.     )  
  96. )  
  97. fig = dict( data=[scatter1, scatter2, scatter3, cluster1, cluster2, cluster3], layout=layout )  
  98. # Use py.iplot() for IPython notebook  
  99. plotly.offline.iplot(fig, filename='mesh3d_sample')  



Visualizing K-Means Clusters in N Dimensions 
What if you’re clustering over more than 3 columns? How do you visualize that? One common approach is to split the 4th dimension data into groups and plot a 3D graph for each of those groups. Another approach is to split all the data into groups based on the k-means cluster value, then apply an aggregation function such as sum or average to all the dimensions in that group, then plot those aggregate values in a heatmap. This approach is described in the next section. 

Visualizing Higher Order clusters for a Customer 360 scenario 
In the following notebook, I’ve produced an artificial dataset with 12 feature columns. I’m using this dataset to simulate a customer 360 dataset in which customers for a large bank have been characterized by a variety of attributes, such as the balances in various accounts. By plotting the k-means cluster groups and feature columns in a heatmap we can illustrate how a large bank could use machine learning to categorize their customer base into groups so that they could conceivably develop things like marketing campaigns or recommendation engines that more accurately target the concerns of the customers in those groups (kmeans360.ipynb): 
  1. #Initializes plotting library and functions for 3D scatter plots   
  2. from pyspark.ml.feature import VectorAssembler  
  3. from sklearn.datasets import make_blobs  
  4. from sklearn.datasets import make_gaussian_quantiles  
  5. from sklearn.datasets import make_classification, make_regression  
  6. from sklearn.externals import six  
  7. import pandas as pd  
  8. import numpy as np  
  9. import argparse  
  10. import json  
  11. import re  
  12. import os  
  13. import sys  
  14. import plotly  
  15. import plotly.graph_objs as go  
  16. plotly.offline.init_notebook_mode()  
  17.   
  18. def rename_columns(df, prefix='x'):  
  19.     """  
  20.     Rename the columns of a dataframe to have X in front of them  
  21.   
  22.     :param df: data frame we're operating on  
  23.     :param prefix: the prefix string  
  24.     """  
  25.     df = df.copy()  
  26.     df.columns = [prefix + str(i) for i in df.columns]  
  27.     return df  
  28.   
  29. # create an artificial dataset with 4 clusters  
  30. X, Y = make_classification(n_samples=100, n_classes=4, n_features=12, n_redundant=0, n_informative=12,  
  31.                              scale=1000, n_clusters_per_class=1)  
  32. df = pd.DataFrame(X)  
  33. # ensure all values are positive (this is needed for our customer 360 use-case)  
  34. df = df.abs()  
  35. # rename X columns  
  36. df = rename_columns(df)  
  37. # and add the Y  
  38. df['y'] = Y  
  39.   
  40. # split df into cluster groups  
  41. grouped = df.groupby(['y'], sort=True)  
  42.   
  43. # compute sums for every column in every group  
  44. sums = grouped.sum()  
  45. sums  
  46.   
  47. data = [go.Heatmap( z=sums.values.tolist(),   
  48.                    y=['Persona A''Persona B''Persona C''Persona D'],  
  49.                    x=['Debit Card',  
  50.                       'Personal Credit Card',  
  51.                       'Business Credit Card',  
  52.                       'Home Mortgage Loan',  
  53.                       'Auto Loan',  
  54.                       'Brokerage Account',  
  55.                       'Roth IRA',  
  56.                       '401k',  
  57.                       'Home Insurance',  
  58.                       'Automobile Insurance',  
  59.                       'Medical Insurance',  
  60.                       'Life Insurance',  
  61.                       'Cell Phone',  
  62.                       'Landline'  
  63.                      ],  
  64.                    colorscale='Viridis')]  
  65.   
  66. plotly.offline.iplot(data, filename='pandas-heatmap')  


[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...