程式扎記: [ Python 文章收集 ] Using Python's Watchdog to monitor changes to a directory

Source From Here
Preface
Watchdog is a handy Python package which uses the inotify Linux kernel subsystem to watch for any changes to the filesystem. This makes it an excellent foundation to build a a small script which takes action whenever a file is received in a directory, or any of the directory's contents change. An example might be a client-facing sftp server where you may want to receive an email when a file is received.

You can install the package with below command:

# pip install watchdog

How to
First create the monitoring script, it will run daemonized and will observe any changes to the given directory. In that script 3 modules/classes will be used:

* time from Python will be used to sleep the main loop
* watchdog.observers.Observer is the class that will watch for any change, and then dispatch the event to specified the handler.
* watchdog.events.PatterMatchingHandler is the class that will take the event dispatched by the observer and perform some action

- watch_for_changes.py

view plaincopy to clipboardprint?
import time    
from watchdog.observers import Observer    
from watchdog.events import PatternMatchingEventHandler   

PatternMatchingEventHandler inherits from FileSystemEventHandler and exposes some usefull methods:

* on_any_event: if defined, will be executed for any event
* on_created: Executed when a file or a directory is created
* on_modified: Executed when a file is modified or a directory renamed
* on_moved: Executed when a file or directory is moved
* on_deleted: Executed when a file or directory is deleted.

Each one of those methods receives the event object as first parameter, and the event object has 3 attributes:

* event_type: 'modified' | 'created' | 'moved' | 'deleted'
* is_directory: True | False
* src_path: path/to/observed/file

So to create a handler just inherit from one of the existing handlers, for this example PatternMatchingEventHandler will be used to match only xml files. To simplify I will enclose the file processor in just one method, and I will implement method only for on_modified and on_created, which means that my handler will ignore any other events.

Also defining the patterns attribute to watch only for files with xml or lxml extensions.

view plaincopy to clipboardprint?
class MyHandler(PatternMatchingEventHandler):  
    patterns = ["*.xml", "*.lxml"]  
  
    def process(self, event):  
        """  
        event.event_type   
            'modified' | 'created' | 'moved' | 'deleted'  
        event.is_directory  
            True | False  
        event.src_path  
            path/to/observed/file  
        """  
        # the file will be processed there  
        print event.src_path, event.event_type  # print now only for degug  
  
    def on_modified(self, event):  
        self.process(event)  
  
    def on_created(self, event):  
        self.process(event)  

With the above handler only creation and modification will be watched now the Obserser needs to be scheduled.

view plaincopy to clipboardprint?
if __name__ == '__main__':  
    args = sys.argv[1:]  
    observer = Observer()  
    observer.schedule(MyHandler(), path=args[0] if args else '.')  
    observer.start()  
  
    try:  
        while True:  
            time.sleep(1)  
    except KeyboardInterrupt:  
        observer.stop()  
  
    observer.join()  

Notes.

You can set the named-argument "recursive" to True for observer.schedule. if you want to watch for files in subfolders.

That's all needed to watch for modifications on the given directory, it will take the current directory as default or the path given as first parameter.

# python watch_for_changes.py /path/to/directory

Let it run in a shell and open another one or the file browser to change or create new .xml files in the /path/to/directory.

# echo "testing" > /tmp/test.xml

Since the handler is printing the results, the outrput should be:

/tmp/test.xml created
/tmp/test.xml modified

Now to complete the script only need to implement in the process method, the necessary logic to parse and insert to database. For example, if the xml file contains some data about current track on a web radio:

The easiest way to parse this small xml is using xmltodict library.

# pip install xmltodict

With xmltodict.parse function the above xml will be outputed as an OrderedDict:

view plaincopy to clipboardprint?
OrderedDict([(u'Pulsar',  
    OrderedDict([(u'OnAir',  
        OrderedDict([(u'media_type', u'default'),  
        (u'media',   
            OrderedDict([(u'title1', u'JOVEM PAN FM'),  
                         (u'title2', u'100,9MHz'),  
                         (u'title3', u'A maior rede de radio do Brasil'),  
                         (u'title4', u'00:00:00'),  
                         (u'media_id1', u'#ID_Title#'),  
                         (u'media_id2', u'#ID_SubTitle#'),  
                         (u'media_id3', u'#ID_Album#'),  
                         (u'hour', u'2013-12-07 11:44:32'),  
                         (u'length', u'#Duration#'),  
                         (u'ISRC', u'#Code#'),  
                         (u'id_singer', u'#ID_Singer#'),  
                         (u'id_song', u'#ID_Song#'),  
                         (u'id_album', u'#ID_Album#'),  
                         (u'id_jpg', u'#Jpg#')]))]))]))])  

Now we can just access that dict to create the registry on filesystem or something else. Notice that I will use a lot of get method of dict type to avoid KeyErrors:

view plaincopy to clipboardprint?
with open(event.src_path, 'r') as xml_source:  
    xml_string = xml_source.read()  
    parsed = xmltodict.parse(xml_string)  
    element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')  
    if not element:  
        return  
    print dict(element)  

and the output will be:

{u'hour': u'2013-12-07 11:44:32',
u'title2': u'100,9MHz',
u'id_album': u'#ID_Album#',
u'title1': u'JOVEM PAN FM',
u'length': u'#Duration#',
u'title3': u'A maior rede de radio do Brasil',
u'title4': u'00:00:00',
u'ISRC': u'#Code#',
u'id_song': u'#ID_Song#',
u'media_id2': u'#ID_SubTitle#',
u'media_id1': u'#ID_Title#',
u'id_jpg': u'#Jpg#',
u'media_id3': u'#ID_Album#',
u'id_singer': u'#ID_Singer#'}

Much better than XPATH, and for this particular case when the xml_source is small there will no relevant performace issue. Now only need to get the values and populate the database, in my case I will use Redis DataModel as storage. Also I will use magicdate module to automagically convert the date format to datetime object. The complete code is as below:

view plaincopy to clipboardprint?
import sys  
import time  
import xmltodict  
import magicdate  
from watchdog.observers import Observer  
from watchdog.events import PatternMatchingEventHandler  
  
from .models import Media  
  
  
class MyHandler(PatternMatchingEventHandler):  
    patterns=["*.xml"]  
  
    def process(self, event):  
        """  
        event.event_type  
            'modified' | 'created' | 'moved' | 'deleted'  
        event.is_directory  
            True | False  
        event.src_path  
            path/to/observed/file  
        """  
  
        with open(event.src_path, 'r') as xml_source:  
            xml_string = xml_source.read()  
            parsed = xmltodict.parse(xml_string)  
            element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')  
            if not element:  
                return  
  
            media = Media(  
                title=element.get('title1'),  
                description=element.get('title3'),  
                media_id=element.get('media_id1'),  
                hour=magicdate(element.get('hour')),  
                length=element.get('title4')  
            )  
            media.save()  
  
    def on_modified(self, event):  
        self.process(event)  
  
    def on_created(self, event):  
        self.process(event)  
  
  
if __name__ == '__main__':  
    args = sys.argv[1:]  
    observer = Observer()  
    observer.schedule(MyHandler(), path=args[0] if args else '.')  
    observer.start()  
  
    try:  
        while True:  
            time.sleep(1)  
    except KeyboardInterrupt:  
        observer.stop()  
  
    observer.join()  

Supplement
* Using Python's Watchdog to monitor changes to a directory

1 則留言:

匿名2022年4月18日晚上8:48
程式扎記: [ Python 文章收集 ] Using Python'S Watchdog To Monitor Changes To A Directory >>>>> Download Now

>>>>> Download Full

程式扎記: [ Python 文章收集 ] Using Python'S Watchdog To Monitor Changes To A Directory >>>>> Download LINK

>>>>> Download Now

程式扎記: [ Python 文章收集 ] Using Python'S Watchdog To Monitor Changes To A Directory >>>>> Download Full

>>>>> Download LINK C1
回覆刪除
回覆

新增留言

程式扎記

標籤

2018年1月18日星期四

[ Python 文章收集 ] Using Python's Watchdog to monitor changes to a directory

1 則留言:

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2018年1月18日 星期四

[ Python 文章收集 ] Using Python's Watchdog to monitor changes to a directory

1 則留言:

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2018年1月18日星期四