Using TensorFlow’s Supervisor with TensorBoard summary groups

One of TensorFlow’s more awesome parts is definitely TensorBoard, i.e. the capability of collecting and visualizing data from the TensorFlow graph as the network is running while also being able to display and browse the graph itself. Coming from Caffe, where I eventually wrote my own tooling just to visualize the training loss from logs of the raw console output and hat to copy-paste the graph’s prototxt to some online service in order to visualize it, this is a massive step in the best possible direction.

TL;DR: Scroll to the end for an example of using grouped summaries with the Supervisor.

Apart from just storing scalar data for TensorBoard, the histogram feature turned out to be especially valuable to me for observing the performance of a probability inference step.

Using TensorFlow’s Supervisor with TensorBoard summary groups

Here, the left half shows the distribution of ground truth probability values in the training and validation sets over time, whereas the right half shows the actual inferred probabilities over time. It’s not hard to see that the network is getting better, but there is more to it:

The histogram of the ground truth values (here on the left) allows you to verify that your training data is indeed correct. If the data is not balanced, you might learn a network that is biased towards one outcome. If the network does indeed obtain some biased view of the data, you’ll cleary see patterns emerging in the inferred histogram that do not match the expected ground truth distribution. In this example, the right histograms approach the left histograms, so it appears to be working fine. However, if you only measure network performance in accuracy, as ratio of correct guesses over all examples, you might be getting the wrong impression: If the input distribution is skewed towards 95% positive and 5% negative examples, a network guessing positive“ 100% of the time is producing only 5% error. If your total accuracy is an aggregate over multiple different values, you will definitely miss this, especially since randomized mini-batches only further obscure this issue. Worse, if the learned coefficients run into saturation, learning will stop for them. Again, this might not be obvious if the total loss and accuracy is actually an aggregate of different values. Influence of the learning rate

Let’s take the example of a variable learning rate. If at some point the training slows down, it’s not immediately clear if this is due to the fact that

a parameter space optimum has been found and training is done, the algorithm found a plateau in parameter space and would continue to fall after a few more hundreds or thousands of iterations or the training is actually diverging because the learning rate is not small enough in order to enter a local optimum in the first place.

Now optimizers like Adam are tailored to overcome the problems of fixed learning rates but they too can only go so far: If the learning rate is too big to begin with, it’s still too big after fine-tuning. Or worse, after a couple of iterations the adjusted weights could end up in saturation and no further change would be able to do anything to change this.

To rule out at least one part, you can make the learning rate a changeable parameter of the network, e.g. a function of the training iteration. I had some success in using Caffe’s multi-step“ approach of changing the learning rate at fixed iteration numbers say, reducing it one decade at iteration 1000, 5000 and 16000 where I determined these values over different training runs of the network.

So instead of baking the learning rate into the graph during construction, you would define a placeholder for it and feed the learning rate of the current epoch/iteration into the optimization operation each time you call it, like so:

with tf.Graph().as_default() as graph: p_lr = tf.placeholder(tf.float32, (), name='learning_rate') t_loss = tf.reduce_mean(...) op_minimize = tf.train.AdamOptimizer(learning_rate=p_lr)\ .minimize(t_loss) with tf.Session(graph=graph) as sess: init = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) sess.run(init) for _ in range(0, epochs): learning_rate = 0.1 loss, _ = sess.run([t_loss, op_minimize], feed_dict={p_lr: learning_rate)

Alternatively, you could make it a non-learnable Variable and explicitly assign it whenever it needs to be changed; let’s assume we don’t do that.

The first thing I usually do is then to also add a summary node to track the current learning rate (as well as the training loss):

with tf.Graph().as_default() as graph: p_lr = tf.placeholder(tf.float32, (), name='learning_rate') t_loss = tf.reduce_mean(...) op_minimize = tf.train.AdamOptimizer(learning_rate=p_lr)\ .minimize(t_loss) tf.summary.scalar('learning_rate', p_lr) tf.summary.scalar('loss', t_loss) # histograms work the same way tf.summary.histogram('probability', t_some_batch) s_merged = tf.summary.merge_all() writer = tf.summary.FileWriter('log', graph=graph) with tf.Session(graph=graph) as sess: init = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) sess.run(init) for _ in range(0, epochs): learning_rate = 0.1 loss, summary, _ = sess.run([t_loss, s_merged, op_minimize], feed_dict={p_lr: learning_rate) writer.add_summary(summary)

Now, for each epoch, the values of the t_loss and p_lr tensors are stored in a protocol buffer file in the log subdirectory. You can then start TensorBoard with the --log-dir parameter pointing to it and get a nice visualization of the training progress.

And one example where doing this massively helped me tracking down errors is exactly the network I took the introduction histogram picture from; here, I set the learning rate to 0.1 for about a two hundred iterations before dropping it to 0.01 . It turned out that having the learning rate this high for my particular network did result in saturation and learning effectively stopped. The histogram helped noticing the issue and the scalar graph helped determining the correct“ learning rates.

Training and validation set summaries

Suppose now you want to have different summaries that may or may not appear on different instances of the graph. The learning rate, for example, has no influence on the outcome of the validation batch, so including it in validation runs is only eating up time, memory and storage. However, the tf.summary.merge_all() operation doesn’t care where the summaries live per se and since some summaries depend on nodes from the training graph (e.g. the learning rate placeholder), you suddenly create a dependency on nodes you didn’t want to trigger with effects of very varying levels of fun.

It turns out that summarries can be bundled into collections e.g. train“ and test“ by specifying their membership upon construction, so that you can later obtain only those summaries that belong to the specified collections:

Using TensorFlow’s Supervisor with TensorBoard summary groups

Trending Articles

SM3268AB 8CE三星量产无法格式化

[下载工具]Think4V utubedown(Youtube高清视频下载工具) v2.1.6 官方版2.1.3

出售: SINE Othello 電源線

博讯｜张磊帮助下，李源潮的儿子被耶鲁录取

FullEventLogView 1.73 免安裝中文版 - 事件檢視器取代工具

同門四角戀？李沛旭喇舌「小郭雪芙」曾智希，蔡淑臻拍完婚紗...怒毀婚

五代RAV4 降車身（機械車位因素）

[攻略] 《魔獸世界》6.2.2 白色魚人蛋再現！來去收編魚人寶寶特基！

jetBrains Product crack 2024 Java based

2013 KUGA 6G轉動方向盤會聽到摳摳摳的異音，有人知道原因嗎?

【豌豆字幕組】[藥屋少女的呢喃（藥師少女的獨語）/ Kusuriya no Hitorigoto][25][繁體][1080P][MP4]

好用的照片后期处理软件【DxO PhotoLab Elite 5.4.0.4765 (x64) 多语言便携版】..

出售: Thixar Silence Plus 啫喱板

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

三條崙討海人故事…重建烏倉寮憶43年前船難

致喬立建設道歉聲明

[一般] 神州全地圖掉寶資料

方易通7862 8/128G 無360 刷機

動感校園小記者・瑪利諾修院學校｜採訪王瑋駿陳晞文帶領試玩風帆

有藍電流行車紀錄器分享文嗎