前言
在我們使用伺服器的時候,有時候需要監控檔案或資料夾的變化。例如,定期掃描資料夾下是否有某一型別的檔案生成。
今天,我們介紹如何使用 Python 來監控檔案系統。
在 Python 中,主要有兩個監控檔案系統的庫:
pyinotify:https://github.com/seb-m/pyinotify/wikipyinotify 依賴的是 Linux 的 inotify,所以它只能在 linux 中使用
watchdog:http://pythonhosted.org/watchdog/watchdog 是跨平臺(linux、Windows、macOS)的庫,對不同平臺都進行了封裝。但是需要安裝依賴庫。
通常我們使用的伺服器都是 linux 系統,所以為了避免安裝依賴包的麻煩,我們使用 pyinotify 庫來實現我們的監控檔案系統的功能。
介紹1. 安裝依賴Linux kernel with inotify ≥ 2.6.13Python ≥ 2.4 or 3.x安裝pip install pyinotify
使用
比方說,我們要監視 /tmp 及其所有子目錄中的每一個新檔案的建立或刪除。為了簡單起見,我們只在標準輸出上列印每個通知的訊息
現在有幾種讀取事件並採取適當操作的策略,其中有四種流行的模式:
使用 的 loop() 方法進行不間斷的監視。在這種情況下,當前程序只需要專注於此任務透過在 Notifier 中使用 timeout 通知程式在方便時顯式呼叫處理方法來進行定期監視。在這種情況下,它不會阻塞您當前的執行緒,但是如果超時值設定得太小,也可能導致少量的響應性丟失或資源使用增加生成新執行緒來獨立地監視檔案事件,此方法例項化 ThreadedNotifier透過 AsyncNotifier 類進行非同步監控具體使用哪種策略主要取決於您的需要和您的環境,下面我們來實現這四種策略
共有程式碼import pyinotify# The watch manager stores the watches and provides operations on watcheswm = pyinotify.WatchManager()# 監視事件mask = pyinotify.IN_DELETE | pyinotify.IN_CREATEclass EventHandler(pyinotify.ProcessEvent): def process_IN_CREATE(self, event): print "Creating:", event.pathname def process_IN_DELETE(self, event): print "Removing:", event.pathname
我們定義了一個 EventHandler 類,並繼承一個 ProcessEvent 處理事件類。
對於某一 EVENT_TYPE,其對應的 process_EVENT_TYPE 函式將被執行。
EVENT_TYPE 包括:
1. 不加 timeout 的 Notifier# 建立事件處理物件handler = EventHandler()notifier = pyinotify.Notifier(wm, handler)wdd = wm.add_watch('/tmp', mask, rec=True)notifier.loop()
接下去就要新增需要監聽的目錄
wdd = wm.add_watch('/tmp', mask, rec=True)
最後,執行監聽,在使用 Ctrl+C 終止之前,程式都會一直執行
notifier.loop()
2. 使用 timeout 的 Notifier
如果您希望定期檢查事件而不是阻塞事件,可以使用較短的超時值來構造 Notifier
notifier = pyinotify.Notifier(wm, handler, timeout=10)
並使用這樣的函式檢查和處理事件
def quick_check(notifier): assert notifier._timeout is not None, 'Notifier must be constructed with a short timeout' notifier.process_events() while notifier.check_events(): #loop in case more events appear while we are processing notifier.read_events() notifier.process_events()
3. 使用 ThreadedNotifier 類notifier = pyinotify.ThreadedNotifier(wm, EventHandler())notifier.start()wdd = wm.add_watch('/tmp', mask, rec=True)wm.rm_watch(wdd.values())notifier.stop()
當然,在需要的時候,我們也可以刪除監視目錄
if wdd['/tmp'] > 0: # test if the wd is valid, ie. if /tmp is being watched, this test is not mandatory though wm.rm_watch(wdd['/tmp'])
注意:如果 /tmp 中存在子目錄,那麼還會監聽其子目錄。如果要刪除 /tmp 及其子目錄的監聽,可以使用 rec=True
wm.rm_watch(wdd['/tmp'], rec=True)
或者
wm.rm_watch(wdd.values())
在我們操作完目錄之後,最後需要將監聽事件停止
notifier.stop()
4. 使用 AsyncNotifier 類
import asyncorenotifier = pyinotify.AsyncNotifier(wm, EventHandler())wdd = wm.add_watch('/tmp', mask, rec=True)asyncore.loop()
這個 AsyncNotifier 類依賴於 Python 標準模組 asyncore,是輪詢事件的一個替代方法。
示例import pyinotifyimport sysclass EventHandler(pyinotify.ProcessEvent): def my_init(self, file_object=sys.stdout): """ This is your constructor it is automatically called from ProcessEvent.__init__(), And extra arguments passed to __init__() would be delegated automatically to my_init(). """ self._file_object = file_object self.sign = False def process_IN_CREATE(self, event): """ This method processes a specific type of event: IN_CREATE. event is an instance of Event. """ if hasattr(event, 'pathname') and event.pathname.endswith('.xls'): print('%s have been created.\n' % event.pathname, file=self._file_object) def process_IN_DELETE(self, event): """ This method processes a specific type of event: IN_DELETE. event is an instance of Event. """ print('deleting: %s\n' % event.pathname, file=self._file_object) # pass def process_IN_CLOSE(self, event): """ This method is called on these events: IN_CLOSE_WRITE and IN_CLOSE_NOWRITE. """ # print('closing: %s\n' % event.pathname, file=self._file_object) pass def process_IN_CLOSE_WRITE(self, event): """ This method processes a specific type of event: IN_CLOSE_WRITE """ pass def process_default(self, event): """ Eventually, this method is called for all others types of events. This method can be useful when an action fits all events. """ # print('default processing\n', file=self._file_object) passdef watching(path, exclude_path=None, rec=False, read_freq=0, timeout=None): """ watch files or directories @args: path: str or list of str, Path to watch, the path can either be a file or a directory. Also accepts a sequence (list) of paths. exclude_path: str or list, predicate (boolean function), which returns True if the current path must be excluded from being watched. This argument has precedence over exclude_filter passed to the class' constructor. rec: Recursively add watches from path on all its subdirectories, set to False by default (doesn't follows symlinks in any case) read_freq: if read_freq == 0, events are read asap, if read_freq is > 0, this thread sleeps max(0, read_freq - (timeout / 1000)) seconds. But if timeout is None it may be different because poll is blocking waiting for something to read. timeout: see read_freq above. If provided, it must be set in milliseconds """ # Instanciate a new WatchManager (will be used to store watches) wm = pyinotify.WatchManager() # events types mask = pyinotify.IN_DELETE | pyinotify.IN_CREATE # Associate this WatchManager with a Notifier (will be used to report and process events). notifier = pyinotify.Notifier(wm, EventHandler(), read_freq=read_freq, timeout=timeout) # Add a new watch on 'path' for some XXX_EVENTS. if isinstance(path, str) or isinstance(path, list): print("now starting monitor %s." %path) wm.add_watch(path, mask, rec=rec, exclude_filter=exclude_path) else: raise ValueError("the %s seems not valid path" %(path)) # Loop forever and handle events. notifier.loop() if __name__ == '__main__': path = '~/jupyter/Others/Wechat' watching(path)
這個指令碼監控目錄下 .xls 型別檔案的建立和任意檔案刪除
執行$ python watching.py now starting monitor ~/jupyter/Others/Wechat.
新建 xls 檔案
touch a.xls b.xls c.txt
輸出
~/jupyter/Others/Wechat/a.xls have been created.~/jupyter/Others/Wechat/b.xls have been created.
刪除檔案
ls | egrep '.*xls|txt' | xargs rm
輸出
deleting: ~/jupyter/Others/Wechat/a.xlsdeleting: ~/jupyter/Others/Wechat/b.xlsdeleting: ~/jupyter/Others/Wechat/c.txt
總結
上面的指令碼主要用來監聽目錄下面檔案的建立和刪除,如果想要監聽其他事件,可以在指令碼的基礎上進行改下,並新增對應的時間處理。
同時,如果想要定時監控的話,可以設定 read_freq 和 timeout 引數。
計算公式為:
max(0, read_freq - (timeout / 1000))
檔案 github 地址:
https://github.com/dxsbiocc/learn/blob/main/Python/utils/watching.py