2018年1月18日 星期四

[ Python 文章收集 ] Using Python's Watchdog to monitor changes to a directory

Source From Here 
Preface 
Watchdog is a handy Python package which uses the inotify Linux kernel subsystem to watch for any changes to the filesystem. This makes it an excellent foundation to build a a small script which takes action whenever a file is received in a directory, or any of the directory's contents change. An example might be a client-facing sftp server where you may want to receive an email when a file is received. 

You can install the package with below command: 
# pip install watchdog


How to 
First create the monitoring script, it will run daemonized and will observe any changes to the given directory. In that script 3 modules/classes will be used: 
* time from Python will be used to sleep the main loop
watchdog.observers.Observer is the class that will watch for any change, and then dispatch the event to specified the handler.
* watchdog.events.PatterMatchingHandler is the class that will take the event dispatched by the observer and perform some action

- watch_for_changes.py 
  1. import time    
  2. from watchdog.observers import Observer    
  3. from watchdog.events import PatternMatchingEventHandler   
PatternMatchingEventHandler inherits from FileSystemEventHandler and exposes some usefull methods: 
on_any_event: if defined, will be executed for any event
on_created: Executed when a file or a directory is created
on_modified: Executed when a file is modified or a directory renamed
on_moved: Executed when a file or directory is moved
on_deleted: Executed when a file or directory is deleted.

Each one of those methods receives the event object as first parameter, and the event object has 3 attributes: 
* event_type: 'modified' | 'created' | 'moved' | 'deleted'
* is_directory: True | False
* src_path: path/to/observed/file

So to create a handler just inherit from one of the existing handlers, for this example PatternMatchingEventHandler will be used to match only xml files. To simplify I will enclose the file processor in just one method, and I will implement method only for on_modified and on_created, which means that my handler will ignore any other events. 

Also defining the patterns attribute to watch only for files with xml or lxml extensions. 
  1. class MyHandler(PatternMatchingEventHandler):  
  2.     patterns = ["*.xml""*.lxml"]  
  3.   
  4.     def process(self, event):  
  5.         """  
  6.         event.event_type   
  7.             'modified' | 'created' | 'moved' | 'deleted'  
  8.         event.is_directory  
  9.             True | False  
  10.         event.src_path  
  11.             path/to/observed/file  
  12.         """  
  13.         # the file will be processed there  
  14.         print event.src_path, event.event_type  # print now only for degug  
  15.   
  16.     def on_modified(self, event):  
  17.         self.process(event)  
  18.   
  19.     def on_created(self, event):  
  20.         self.process(event)  
With the above handler only creation and modification will be watched now the Obserser needs to be scheduled. 
  1. if __name__ == '__main__':  
  2.     args = sys.argv[1:]  
  3.     observer = Observer()  
  4.     observer.schedule(MyHandler(), path=args[0if args else '.')  
  5.     observer.start()  
  6.   
  7.     try:  
  8.         while True:  
  9.             time.sleep(1)  
  10.     except KeyboardInterrupt:  
  11.         observer.stop()  
  12.   
  13.     observer.join()  
Notes. 
You can set the named-argument "recursive" to True for observer.schedule. if you want to watch for files in subfolders.

That's all needed to watch for modifications on the given directory, it will take the current directory as default or the path given as first parameter. 
# python watch_for_changes.py /path/to/directory

Let it run in a shell and open another one or the file browser to change or create new .xml files in the /path/to/directory
# echo "testing" > /tmp/test.xml

Since the handler is printing the results, the outrput should be: 
/tmp/test.xml created
/tmp/test.xml modified

Now to complete the script only need to implement in the process method, the necessary logic to parse and insert to database. For example, if the xml file contains some data about current track on a web radio: 


The easiest way to parse this small xml is using xmltodict library. 

# pip install xmltodict

With xmltodict.parse function the above xml will be outputed as an OrderedDict
  1. OrderedDict([(u'Pulsar',  
  2.     OrderedDict([(u'OnAir',  
  3.         OrderedDict([(u'media_type', u'default'),  
  4.         (u'media',   
  5.             OrderedDict([(u'title1', u'JOVEM PAN FM'),  
  6.                          (u'title2', u'100,9MHz'),  
  7.                          (u'title3', u'A maior rede de radio do Brasil'),  
  8.                          (u'title4', u'00:00:00'),  
  9.                          (u'media_id1', u'#ID_Title#'),  
  10.                          (u'media_id2', u'#ID_SubTitle#'),  
  11.                          (u'media_id3', u'#ID_Album#'),  
  12.                          (u'hour', u'2013-12-07 11:44:32'),  
  13.                          (u'length', u'#Duration#'),  
  14.                          (u'ISRC', u'#Code#'),  
  15.                          (u'id_singer', u'#ID_Singer#'),  
  16.                          (u'id_song', u'#ID_Song#'),  
  17.                          (u'id_album', u'#ID_Album#'),  
  18.                          (u'id_jpg', u'#Jpg#')]))]))]))])  
Now we can just access that dict to create the registry on filesystem or something else. Notice that I will use a lot of get method of dict type to avoid KeyErrors
  1. with open(event.src_path, 'r') as xml_source:  
  2.     xml_string = xml_source.read()  
  3.     parsed = xmltodict.parse(xml_string)  
  4.     element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')  
  5.     if not element:  
  6.         return  
  7.     print dict(element)  
and the output will be: 
{u'hour': u'2013-12-07 11:44:32',
u'title2': u'100,9MHz',
u'id_album': u'#ID_Album#',
u'title1': u'JOVEM PAN FM',
u'length': u'#Duration#',
u'title3': u'A maior rede de radio do Brasil',
u'title4': u'00:00:00',
u'ISRC': u'#Code#',
u'id_song': u'#ID_Song#',
u'media_id2': u'#ID_SubTitle#',
u'media_id1': u'#ID_Title#',
u'id_jpg': u'#Jpg#',
u'media_id3': u'#ID_Album#',
u'id_singer': u'#ID_Singer#'}

Much better than XPATH, and for this particular case when the xml_source is small there will no relevant performace issue. Now only need to get the values and populate the database, in my case I will use Redis DataModel as storage. Also I will use magicdate module to automagically convert the date format to datetime object. The complete code is as below: 
  1. import sys  
  2. import time  
  3. import xmltodict  
  4. import magicdate  
  5. from watchdog.observers import Observer  
  6. from watchdog.events import PatternMatchingEventHandler  
  7.   
  8. from .models import Media  
  9.   
  10.   
  11. class MyHandler(PatternMatchingEventHandler):  
  12.     patterns=["*.xml"]  
  13.   
  14.     def process(self, event):  
  15.         """  
  16.         event.event_type  
  17.             'modified' | 'created' | 'moved' | 'deleted'  
  18.         event.is_directory  
  19.             True | False  
  20.         event.src_path  
  21.             path/to/observed/file  
  22.         """  
  23.   
  24.         with open(event.src_path, 'r') as xml_source:  
  25.             xml_string = xml_source.read()  
  26.             parsed = xmltodict.parse(xml_string)  
  27.             element = parsed.get('Pulsar', {}).get('OnAir', {}).get('media')  
  28.             if not element:  
  29.                 return  
  30.   
  31.             media = Media(  
  32.                 title=element.get('title1'),  
  33.                 description=element.get('title3'),  
  34.                 media_id=element.get('media_id1'),  
  35.                 hour=magicdate(element.get('hour')),  
  36.                 length=element.get('title4')  
  37.             )  
  38.             media.save()  
  39.   
  40.     def on_modified(self, event):  
  41.         self.process(event)  
  42.   
  43.     def on_created(self, event):  
  44.         self.process(event)  
  45.   
  46.   
  47. if __name__ == '__main__':  
  48.     args = sys.argv[1:]  
  49.     observer = Observer()  
  50.     observer.schedule(MyHandler(), path=args[0if args else '.')  
  51.     observer.start()  
  52.   
  53.     try:  
  54.         while True:  
  55.             time.sleep(1)  
  56.     except KeyboardInterrupt:  
  57.         observer.stop()  
  58.   
  59.     observer.join()  
Supplement 
Using Python's Watchdog to monitor changes to a directory

沒有留言:

張貼留言

[Linux 文章收集] Linux / Unix: Check Last Time User Logged In On The System

Source From  Here   Question   I am a new Unix system admin. How do I find ouw who has recently use the Linux or Unix-like server?  Which te...