Deepmind在《Nature》發表《Mastering the game of Go without human knowledge》,文中介紹了AlphaGo Zero在零基礎背景的條件下,不參考任何人類棋譜,使用黑白棋子,在19X19網格的棋盤,依據明確的圍棋規則,透過Tabula Rosa learning(白板學習):
【自我訓練】3天,以100:0戰勝AlphaGo Lee.
【自我訓練】3周,戰勝AlphaGo Master.
圖片來自DeepMind
DeepMind創始人兼CEO Demis Hassabis
AlphaGo團隊負責人David Silver
這樣描述AlphaGo Zero:
We introduce AlphaGo Zero, the latest evolution of AlphaGo, the first computer program to defeat a world champion at the ancient Chinese game of Go. Zero is even more powerful and is arguably the strongest Go player in history. Previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0.(英文來源於DeepMind網站)
透過從完全隨機的遊戲開始(真關心它第一盤的棋譜,自己同自己?不知道是否可以找到?第一步下哪兒了?),by playing games against itself(這個翻譯容易誤導,就是自己給自己下圍棋,但實際不完全是,實際是在程式設計工程師的引導下同AlphaGo對練)。同AlphaGo 不斷的下棋後(一開始應該老輸),AlphaGo Zero很快超過了人類的圍棋水平,最後以100:0的比分擊敗了AlphaGo。
我們也驚訝的發現:
AlphaGo Zero的陪練對手有AlphaGo,不是我,不然應該學習的沒有這麼快。
我要有機會天天同柯潔對練,估計我也很快成圍棋高手了?
DeepMind的白板學習說法,有一些瑕疵?
我認為真正的白板學習,應該是兩臺完全沒有人類資料的AlphaGo Zero自己對練,不知道要練習多久,可以戰勝AlphaGo。這個實驗資料DeepMind有沒有公開?我沒有看到,不公開就不地道了,這個是分分秒就可以獲得的實驗結果。AlphaGo Zero skips this step and learns to play simply by playing games against itself. 這樣文中“against itself”的說法有問題?
Deepmind在《Nature》發表《Mastering the game of Go without human knowledge》,文中介紹了AlphaGo Zero在零基礎背景的條件下,不參考任何人類棋譜,使用黑白棋子,在19X19網格的棋盤,依據明確的圍棋規則,透過Tabula Rosa learning(白板學習):
【自我訓練】3天,以100:0戰勝AlphaGo Lee.
【自我訓練】3周,戰勝AlphaGo Master.
圖片來自DeepMind
DeepMind創始人兼CEO Demis Hassabis
AlphaGo團隊負責人David Silver
這樣描述AlphaGo Zero:
We introduce AlphaGo Zero, the latest evolution of AlphaGo, the first computer program to defeat a world champion at the ancient Chinese game of Go. Zero is even more powerful and is arguably the strongest Go player in history. Previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0.(英文來源於DeepMind網站)
描述非常精彩,我們翻譯一下:
我們介紹一下AlphaGo的升級版:AlphaGo Zero;AlphaGo是第一個在中國古老遊戲【圍棋】中,戰勝了世界冠軍的計算機程式。AlphaGo Zero更強大,是歷史上最強的【圍棋】棋手(聶衛平稱:阿老師)。AlphaGo早期訓練學習了數千人的【圍棋】棋譜,幫助自己學習如何更好的下【圍棋】。
讓我們驚訝的是:AlphaGo Zero跳過了學習人類棋譜資料這一步。
透過從完全隨機的遊戲開始(真關心它第一盤的棋譜,自己同自己?不知道是否可以找到?第一步下哪兒了?),by playing games against itself(這個翻譯容易誤導,就是自己給自己下圍棋,但實際不完全是,實際是在程式設計工程師的引導下同AlphaGo對練)。同AlphaGo 不斷的下棋後(一開始應該老輸),AlphaGo Zero很快超過了人類的圍棋水平,最後以100:0的比分擊敗了AlphaGo。
我們也驚訝的發現:
AlphaGo Zero的陪練對手有AlphaGo,不是我,不然應該學習的沒有這麼快。
我要有機會天天同柯潔對練,估計我也很快成圍棋高手了?
DeepMind的白板學習說法,有一些瑕疵?
我認為真正的白板學習,應該是兩臺完全沒有人類資料的AlphaGo Zero自己對練,不知道要練習多久,可以戰勝AlphaGo。這個實驗資料DeepMind有沒有公開?我沒有看到,不公開就不地道了,這個是分分秒就可以獲得的實驗結果。AlphaGo Zero skips this step and learns to play simply by playing games against itself. 這樣文中“against itself”的說法有問題?
遇強則強,有一個高水平的對手還是很重要的。
還有一個數據沒有公佈或我沒有看到,就是到底對練了多少盤後到達完AlphaGo的水平?贏得第一盤的棋譜應該也很有研究意義,有沒有公開?