程式扎記: 1月 2018

2018年1月20日星期六

[ Python 文章收集 ] A tutorial on python-daemon – or – Why doesn’t python-daemon have any documentation?

Source From Here
Introduction
A few weeks ago I needed to create a daemon for a school project. I had never really dealt with daemons before so I browsed on the Internet for what they were and how they worked. After reading some pages, I found out how hard it seemed to be working with daemons: you need to deal with the correct forking of a process, prevent core dump generation, change root and working directories, change process and file umasks and ownership, and quite a lot of other OS-related stuff you can find listed here. By the way, at this time I’m still not quite sure of how these conditions came to be.

Wait. What is a daemon?
A little side note. Although the word “daemon” might sound like something diabolical, it has nothing to do with Satan, in fact it comes from the Greek word δαίμων, which refers to the spirits people have inside and which eventually define them. A daemon is a process. No more and no less then your browser’s process is. The key difference is, though, that a daemon is a process that doesn’t need user input to work. Think, for instance, about a web server which isn’t waiting for its own user to perform some action, but rather it’s waiting for some other host on the network to perform a request. Such request needs to be processed without any human taking part in it.

So a daemon is a process that performs what you could think of as a background task. May it be a server like a web server or an ssh server, or something more complicated like systemd. OS-wise, daemons are specific to the world of Unix. If you’re running Windows or Mac OS X, you should keep in mind that Unix daemons do have their respective counter-part in other OSes: Windows has its so-called services; OSX sometimes calls them “daemons”, sometimes “agents” and they still pretty much work just as Unix daemons, but it sort of expects you to make them compatible with launchd.

How can I create a daemon?
There are a lot of ways you can create a daemon in Unix, since nobody enforces or supports one in favor of the others. Within the shell of your choice, type in the command you were trying to daemonize and add ‘&’ at the end of it.

# python spam.py &

And that’s it. You have a daemon.

Now, while this is a very fast way to spawn a daemon, it might not be the most reasonable choice for a number of reasons: the process will output anything to your current shell (using ‘&’ doesn’t close the stdout and stderr file descriptors), you can’t assign the daemon any PID lock file so multiple daemons might be running at the same time (I’ll come back to the lock file later) and often that makes it generally harder to control the daemon.

Nonetheless this remains the fastest way to create a daemon in Unix. Just a character away. And sometimes speed and simplicity are exactly what you need. There are a couple recipes online that will do the job for you. I won’t go through the details of all of them, I’ll just link a couple of them here:

* Creating a daemon the python way — by Chad J. Schroeder
* A simple unix/linux daemon in Python — by Sander Marechal
* Daemon with start/stop/restart behavior — by Clark Evans

This does have some consequences: you cannot pip install them and you cannot have a simple way to know if there is going to be an update for any of them (crucial in a security-aware context), but still these recipes will get you the job done. Use “daemon“. Here’s a nice article about it. This is a pretty nice utility, which will do most of the job for you, similarly to python-daemon, but if you’re relying on python code for running your daemon, it may not be the best choice. Anyway, it’s simple enough that you might want to look at it if you need to spawn deamons in C or any other language.

While quite a lot of people (like me) find it tedious to re-invent the wheel, others sometimes feel like they need to re-invent the whole TCP/IP stack from the ground up. I won’t judge you for this. You’re going to have a lot of fun with that. Just a couple of modules I found a lot of people were using while writing their homemade daemons: os.fork, os.setsid, signal, subprocess.

Ok, here we are. This is the way I create daemons and the way I would recommend to most people by using python-daemon.

It doesn’t require a lot of knowledge about the underlying machinery that gets the daemon to run, its source code is rather readable, its APIs are very very simple (we’ll go through these later), you can pip install it, it is still maintainedand it was going to be the standard way to create daemons in Python. Its main missing feature is its lack of documentation: the documentation you’ll find online is sparse and you’ll often need to look at its source code if you encounter any bugs.

So what is python-daemon?
Back in the first weeks of 2009 PEP 3143 was created. Its aim was to create “a package [in] the Python standard library that provides a simple interface to the task of becoming a daemon process.” While the goal was not an impossible task and quite some people were interested in seeing this project succeed, it didn’t make it. The guy that was in charge of doing it simply didn’t have enough time anymore and no one stepped in to save the project. Such a sad death for such a nice project.

This tragedy didn’t affect the functionality of python-daemon too much (it does include basically anything you need for a daemon), but rather its documentation. As I said earlier, its weakest point is documentation: you’ll find some inside the PEP and some within the code itself.

And how do you make it work?
Ok, I guess I got you interested in python-daemon since you’re still reading. First, let me start off by telling you what you shouldn’t use in this library: DaemonRunner. Googling python-daemon will find some pages that will point you to the DaemonRunner object to handle your daemon, but it is a deprecated part of the library.

Instead, you want to use the DaemonContext API which is used inside of DaemonRunner. It’s true that DaemonRunner extends DaemonContext functionality, but it does so in a very old-fashioned way (doesn’t use argparse for instance). This is probably the reason why it ended up being deprecated.

Without further ado, DaemonContext makes it super simple to start your daemon with just a context manager:

view plaincopy to clipboardprint?
with daemon.DaemonContext():  
    main()  

This is the most basic configuration you can pass to DaemonContext, and it will actually create a well-behaving daemon with just one line of code and four spaces of indentation. I’ll give you an overview of what you can set in order to have a more complex and detailed configuration for the daemon you need.

Dealing with the file system
A daemon is a fairly peculiar process: since it is unbound from human interaction, a daemon will have its own keys to be identified user-wise. This means that, regardless of the user that started a daemon, the daemon will have its own UID, GID (User/Group ID), its own root and working directories, and its own umask.

Don’t be afraid, DaemonContext will take care of this stuff for you, even with just the default configuration, but let’s say that you need to customize this stuff. To change the root directory, useful for confining your daemon, simply set the chroot_directory argument to a valid directory on your file system. The same goes for the working directory, which is a more usual thing to do, under the argument working_directory. By default, DaemonContext will set your working directory to root “/”. An example as below:

view plaincopy to clipboardprint?
with daemon.DaemonContext(  
        chroot_directory=None,  
        working_directory='/var/lib/myprettylittledaemon'):  
    print(os.getcwd())  

In case you don’t see on-screen the result of print, that’s because you need to keep the stdout stream open. Such configuration is explained in the “Preserve files” paragraph below. For the UID and GID, DaemonContext by default “will relinquish any effective privilege elevation inherited by the process” which is usually the reason why you need to change them. In case you don’t find this satisfactory, the process is still pretty straight-forward: set them to what you need, provided that your user is granted permission to do so. In case your user doesn’t have root permissions, DaemonContext will raise a DaemonOSEnvironmentError exception.

view plaincopy to clipboardprint?
with daemon.DaemonContext(  
        uid=1001,  
        gid=777):  
    print(os.getuid())  
    print(os.getgid())  

Additionally, you might want to set the daemon umask, which will set the mode the daemon will create files with (Check os.umask):

view plaincopy to clipboardprint?
with daemon.DaemonContext(  
        umask=0o002):  
    your_mask = os.umask(0)  # i'm doing this weird three lines trick  
    print(your_mask)         # to print the umask set by DaemonContext  
    os.umask(your_mask)      # due to the behaviour of os.umask.  

Preserve files.
One thing to take into account when creating a daemon is that on start DaemonContext will close any open files you have around. This is normal and it’s what it is supposed to do. Now, even though this is the behavior we should expect from the library, you might still need some files to be opened in your program. You can do this by declaring what files you won’t need to be closed through the files_preserve argument. For instance:

view plaincopy to clipboardprint?
some_important_file = open('AVERYBIGDATABASE', 'r')  
  
with daemon.DaemonContext(  
        files_preserve=[some_important_file]):  
    print(some_important_file.readlines())  

Along with your open files, DaemonContext will also close the standard streams file descriptors, namely stdin, stdout and stderr. By default it will redirect them to os.devnull. If you need to keep them open, simply set the stdin, stdout and stderr arguments according to your needs.

view plaincopy to clipboardprint?
with daemon.DaemonContext(  
        stdout=sys.stdout,  
        stderr=sys.stderr):  
    print("Hello World! Daemon here.")  

Handling OS signals
Signals coming from the OS are important, regardless of whether you’re switching your program to be daemonized. Furthermore, this makes it even more important for you to take care of such signals, since it might become the only way a human interacts with your process. DaemonContext will conveniently let you define a dictionary in the signal_map argument that will be linked to the signals you might want to configure. Some popular ones are: SIGINT, SIGKILL, SIGTERM, SIGTSTP. You can find further details here.

view plaincopy to clipboardprint?
import signal  
  
def shutdown(signum, frame):  # signum and frame are mandatory  
    sys.exit(0)  
      
with daemon.DaemonContext(  
        signal_map={  
            signal.SIGTERM: shutdown,  
            signal.SIGTSTP: shutdown  
        }):  
    main()  

One at a time
More than often daemons will use resources, such as a TCP port for a listening server or some files on disk. You’ll probably want to make sure that there aren’t multiple daemons conflicting for these resources. To make sure that only one of your daemons is running at the same time, you can use a PID lock file, which is a file containing the PID of a process that will prevent the same program from running on more than one instance. Please note that it is the duty of the newly spawned process (handled within DaemonContext) to check the lock file and abort the start procedure. If you’re already familiar with threading.Lock the concept is basically the same.

You can set a lock file like this:

view plaincopy to clipboardprint?
import lockfile  
  
with daemon.DaemonContext(  
        pidfile=lockfile.FileLock('/var/run/spam.pid')):  
    main()  

Start/stop/reload
A common pattern for a daemon to interact with its administrator is to provide a start/stop/reload behavior which is usually implemented as a set of command line arguments. This is particularly useful if you’re planning to support initd. DaemonContext, though, will not take care of this for you. DaemonRunner does have code in regard to this behavior, but I wouldn’t advise you to use it directly since it is deprecated. Anyway you can still use its source code as a reference, for further details take a look at the _start and _stop methods.

Conclusions
The package python-daemon is absolutely not the only way you can create a daemon for a Python program, you should carefully consider every possibility you have. If your choice is python-daemon, we have gone through pretty much all of the configurations of DaemonContext. I haven’t covered all of them though, if you’re still looking for more options you should look in the PEP; if you can’t find enough information there, have a look at DaemonContext’s source code.

Supplement
* PEP 3143 -- Standard daemon process library

2018年1月18日星期四

[ Python 文章收集 ] Using Python's Watchdog to monitor changes to a directory

Source From Here
Preface
Watchdog is a handy Python package which uses the inotify Linux kernel subsystem to watch for any changes to the filesystem. This makes it an excellent foundation to build a a small script which takes action whenever a file is received in a directory, or any of the directory's contents change. An example might be a client-facing sftp server where you may want to receive an email when a file is received.

You can install the package with below command:

# pip install watchdog

How to
First create the monitoring script, it will run daemonized and will observe any changes to the given directory. In that script 3 modules/classes will be used:

* time from Python will be used to sleep the main loop
* watchdog.observers.Observer is the class that will watch for any change, and then dispatch the event to specified the handler.
* watchdog.events.PatterMatchingHandler is the class that will take the event dispatched by the observer and perform some action

- watch_for_changes.py

view plaincopy to clipboardprint?
import time    
from watchdog.observers import Observer    
from watchdog.events import PatternMatchingEventHandler   

PatternMatchingEventHandler inherits from FileSystemEventHandler and exposes some usefull methods:

* on_any_event: if defined, will be executed for any event
* on_created: Executed when a file or a directory is created
* on_modified: Executed when a file is modified or a directory renamed
* on_moved: Executed when a file or directory is moved
* on_deleted: Executed when a file or directory is deleted.

Each one of those methods receives the event object as first parameter, and the event object has 3 attributes:

* event_type: 'modified' | 'created' | 'moved' | 'deleted'
* is_directory: True | False
* src_path: path/to/observed/file

So to create a handler just inherit from one of the existing handlers, for this example PatternMatchingEventHandler will be used to match only xml files. To simplify I will enclose the file processor in just one method, and I will implement method only for on_modified and on_created, which means that my handler will ignore any other events.

Also defining the patterns attribute to watch only for files with xml or lxml extensions.

view plaincopy to clipboardprint?
class MyHandler(PatternMatchingEventHandler):  
    patterns = ["*.xml", "*.lxml"]  
  
    def process(self, event):  
        """  
        event.event_type   
            'modified' | 'created' | 'moved' | 'deleted'  
        event.is_directory  
            True | False  
        event.src_path  
            path/to/observed/file  
        """  
        # the file will be processed there  
        print event.src_path, event.event_type  # print now only for degug  
  
    def on_modified(self, event):  
        self.process(event)  
  
    def on_created(self, event):  
        self.process(event)  

With the above handler only creation and modification will be watched now the Obserser needs to be scheduled.

view plaincopy to clipboardprint?
if __name__ == '__main__':  
    args = sys.argv[1:]  
    observer = Observer()  
    observer.schedule(MyHandler(), path=args[0] if args else '.')  
    observer.start()  
  
    try:  
        while True:  
            time.sleep(1)  
    except KeyboardInterrupt:  
        observer.stop()  
  
    observer.join()  

Notes.

You can set the named-argument "recursive" to True for observer.schedule. if you want to watch for files in subfolders.

That's all needed to watch for modifications on the given directory, it will take the current directory as default or the path given as first parameter.

# python watch_for_changes.py /path/to/directory

Let it run in a shell and open another one or the file browser to change or create new .xml files in the /path/to/directory.

# echo "testing" > /tmp/test.xml

Since the handler is printing the results, the outrput should be:

/tmp/test.xml created
/tmp/test.xml modified

Now to complete the script only need to implement in the process method, the necessary logic to parse and insert to database. For example, if the xml file contains some data about current track on a web radio:

The easiest way to parse this small xml is using xmltodict library.

# pip install xmltodict

With xmltodict.parse function the above xml will be outputed as an OrderedDict:

view plaincopy to clipboardprint?
OrderedDict([(u'Pulsar',  
    OrderedDict([(u'OnAir',  
        OrderedDict([(u'media_type', u'default'),  
        (u'media',   
            OrderedDict([(u'title1', u'JOVEM PAN FM'),  
                         (u'title2', u'100,9MHz'),  
                         (u'title3', u'A maior rede de radio do Brasil'),  
                         (u'title4', u'00:00:00'),  
                         (u'media_id1', u'#ID_Title#'),  
                         (u'media_id2', u'#ID_SubTitle#'),  
                         (u'media_id3', u'#ID_Album#'),  
                         (u'hour', u'2013-12-07 11:44:32'),  
                         (u'length', u'#Duration#'),  
                         (u'ISRC', u'#Code#'),  
                         (u'id_singer', u'#ID_Singer#'),  
                         (u'id_song', u'#ID_Song#'),  
                         (u'id_album', u'#ID_Album#'),  
                         (u'id_jpg', u'#Jpg#')]))]))]))])  

Now we can just access that dict to create the registry on filesystem or something else. Notice that I will use a lot of get method of dict type to avoid KeyErrors:

view plaincopy to clipboardprint?
with open(event.src_path, 'r') as xml_source:  
    xml_string = xml_source.read()  
    parsed = xmltodict.parse(xml_string)  
    element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')  
    if not element:  
        return  
    print dict(element)  

and the output will be:

{u'hour': u'2013-12-07 11:44:32',
u'title2': u'100,9MHz',
u'id_album': u'#ID_Album#',
u'title1': u'JOVEM PAN FM',
u'length': u'#Duration#',
u'title3': u'A maior rede de radio do Brasil',
u'title4': u'00:00:00',
u'ISRC': u'#Code#',
u'id_song': u'#ID_Song#',
u'media_id2': u'#ID_SubTitle#',
u'media_id1': u'#ID_Title#',
u'id_jpg': u'#Jpg#',
u'media_id3': u'#ID_Album#',
u'id_singer': u'#ID_Singer#'}

Much better than XPATH, and for this particular case when the xml_source is small there will no relevant performace issue. Now only need to get the values and populate the database, in my case I will use Redis DataModel as storage. Also I will use magicdate module to automagically convert the date format to datetime object. The complete code is as below:

view plaincopy to clipboardprint?
import sys  
import time  
import xmltodict  
import magicdate  
from watchdog.observers import Observer  
from watchdog.events import PatternMatchingEventHandler  
  
from .models import Media  
  
  
class MyHandler(PatternMatchingEventHandler):  
    patterns=["*.xml"]  
  
    def process(self, event):  
        """  
        event.event_type  
            'modified' | 'created' | 'moved' | 'deleted'  
        event.is_directory  
            True | False  
        event.src_path  
            path/to/observed/file  
        """  
  
        with open(event.src_path, 'r') as xml_source:  
            xml_string = xml_source.read()  
            parsed = xmltodict.parse(xml_string)  
            element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')  
            if not element:  
                return  
  
            media = Media(  
                title=element.get('title1'),  
                description=element.get('title3'),  
                media_id=element.get('media_id1'),  
                hour=magicdate(element.get('hour')),  
                length=element.get('title4')  
            )  
            media.save()  
  
    def on_modified(self, event):  
        self.process(event)  
  
    def on_created(self, event):  
        self.process(event)  
  
  
if __name__ == '__main__':  
    args = sys.argv[1:]  
    observer = Observer()  
    observer.schedule(MyHandler(), path=args[0] if args else '.')  
    observer.start()  
  
    try:  
        while True:  
            time.sleep(1)  
    except KeyboardInterrupt:  
        observer.stop()  
  
    observer.join()  

Supplement
* Using Python's Watchdog to monitor changes to a directory

訂閱：文章 (Atom)

程式扎記

標籤

2018年1月20日星期六

[ Python 文章收集 ] A tutorial on python-daemon – or – Why doesn’t python-daemon have any documentation?

2018年1月18日星期四

[ Python 文章收集 ] Using Python's Watchdog to monitor changes to a directory

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2018年1月20日 星期六

[ Python 文章收集 ] A tutorial on python-daemon – or – Why doesn’t python-daemon have any documentation?

2018年1月18日 星期四

[ Python 文章收集 ] Using Python's Watchdog to monitor changes to a directory

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2018年1月20日星期六

2018年1月18日星期四