Yuwen Xiong, Andrew Liao, and Jingkang Wang. In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. In, Cadamuro, G., Gilad-Bachrach, R., and Zhu, X. Debugging machine learning models. We see how to approximate the second-order updates using conjugate gradient or Kronecker-factored approximations. Aggregated momentum: Stability through passive damping. Measuring and regularizing networks in function space. Programming languages & software engineering, Programming languages and software engineering, Designing AI Systems with Steerable Long-Term Dynamics, Using platform models responsibly: Developer tools with human-AI partnership at the center, [ICSE'22] TOGA: A Neural Method for Test Oracle Generation, Characterizing and Predicting Engagement of Blind and Low-Vision People with an Audio-Based Navigation App [Pre-recorded CHI 2022 presentation], Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation [video], Closing remarks: Empowering software developers and mathematicians with next-generation AI, Research talks: AI for software development, MDETR: Modulated Detection for End-to-End Multi-Modal Understanding, Introducing Retiarii: A deep learning exploratory-training framework on NNI, Platform for Situated Intelligence Workshop | Day 2. Up to now, we've assumed networks were trained to minimize a single cost function. Reconciling modern machine-learning practice and the classical bias-variance tradeoff. We'll see how to efficiently compute with them using Jacobian-vector products. Your file of search results citations is now ready. Strack, B., DeShazo, J. P., Gennings, C., Olmo, J. L., Ventura, S., Cios, K. J., and Clore, J. N. Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. On linear models and convolutional neural networks, There are various full-featured deep learning frameworks built on top of JAX and designed to resemble other frameworks you might be familiar with, such as PyTorch or Keras. use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Simonyan, K., Vedaldi, A., and Zisserman, A. In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. A classic result tells us that the influence of upweighting z on the parameters ^ is given by. All Holdings within the ACM Digital Library. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.See more on this video at https://www.microsoft.com/en-us/research/video/understanding-black-box-predictions-via-influence-functions/ (a) What is the effect of the training loss and H 1 ^ terms in I up,loss? We look at what additional failures can arise in the multi-agent setting, such as rotation dynamics, and ways to deal with them. Pang Wei Koh and Percy Liang. ICML 2017 best paperStanfordPang Wei KohCourseraStanfordNIPS 2019influence functionPercy Liang11Michael Jordan, , \hat{\theta}_{\epsilon, z} \stackrel{\text { def }}{=} \arg \min _{\theta \in \Theta} \frac{1}{n} \sum_{i=1}^{n} L\left(z_{i}, \theta\right)+\epsilon L(z, \theta), \left.\mathcal{I}_{\text {up, params }}(z) \stackrel{\text { def }}{=} \frac{d \hat{\theta}_{\epsilon, z}}{d \epsilon}\right|_{\epsilon=0}=-H_{\tilde{\theta}}^{-1} \nabla_{\theta} L(z, \hat{\theta}), , loss, \begin{aligned} \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) &\left.\stackrel{\text { def }}{=} \frac{d L\left(z_{\text {test }}, \hat{\theta}_{\epsilon, z}\right)}{d \epsilon}\right|_{\epsilon=0} \\ &=\left.\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} \frac{d \hat{\theta}_{\epsilon, z}}{d \epsilon}\right|_{\epsilon=0} \\ &=-\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} H_{\hat{\theta}}^{-1} \nabla_{\theta} L(z, \hat{\theta}) \end{aligned}, \varepsilon=-1/n , z=(x,y) \\ z_{\delta} \stackrel{\text { def }}{=}(x+\delta, y), \hat{\theta}_{\epsilon, z_{\delta},-z} \stackrel{\text { def }}{=}\arg \min _{\theta \in \Theta} \frac{1}{n} \sum_{i=1}^{n} L\left(z_{i}, \theta\right)+\epsilon L\left(z_{\delta}, \theta\right)-\epsilon L(z, \theta), \begin{aligned}\left.\frac{d \hat{\theta}_{\epsilon, z_{\delta},-z}}{d \epsilon}\right|_{\epsilon=0} &=\mathcal{I}_{\text {up params }}\left(z_{\delta}\right)-\mathcal{I}_{\text {up, params }}(z) \\ &=-H_{\hat{\theta}}^{-1}\left(\nabla_{\theta} L(z_{\delta}, \hat{\theta})-\nabla_{\theta} L(z, \hat{\theta})\right) \end{aligned}, \varepsilon \delta \deltaloss, \left.\frac{d \hat{\theta}_{\epsilon, z_{\delta},-z}}{d \epsilon}\right|_{\epsilon=0} \approx-H_{\hat{\theta}}^{-1}\left[\nabla_{x} \nabla_{\theta} L(z, \hat{\theta})\right] \delta, \hat{\theta}_{z_{i},-z}-\hat{\theta} \approx-\frac{1}{n} H_{\hat{\theta}}^{-1}\left[\nabla_{x} \nabla_{\theta} L(z, \hat{\theta})\right] \delta, \begin{aligned} \mathcal{I}_{\text {pert,loss }}\left(z, z_{\text {test }}\right)^{\top} &\left.\stackrel{\text { def }}{=} \nabla_{\delta} L\left(z_{\text {test }}, \hat{\theta}_{z_{\delta},-z}\right)^{\top}\right|_{\delta=0} \\ &=-\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} H_{\hat{\theta}}^{-1} \nabla_{x} \nabla_{\theta} L(z, \hat{\theta}) \end{aligned}, train lossH \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) , -y_{\text {test }} y \cdot \sigma\left(-y_{\text {test }} \theta^{\top} x_{\text {test }}\right) \cdot \sigma\left(-y \theta^{\top} x\right) \cdot x_{\text {test }}^{\top} H_{\hat{\theta}}^{-1} x, influence functiondebug training datatraining point \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) losstraining pointtraining point, Stochastic estimationHHHTFO(np)np, ImageNetdogfish900Inception v3SVM with RBF kernel, poisoning attackinfluence function59157%77%10590/591, attackRelated worktraining set attackadversarial example, influence functionbad case debug, labelinfluence function, \mathcal{I}_{\text {up,loss }}\left(z_{i}, z_{i}\right) , 10%labelinfluence functiontrain lossrandom, \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right), \mathcal{I}_{\text {up,loss }}\left(z_{i}, z_{i}\right), \mathcal{I}_{\text {pert,loss }}\left(z, z_{\text {test }}\right)^{\top}, H_{\hat{\theta}}^{-1} \nabla_{x} \nabla_{\theta} L(z, \hat{\theta}), Less Is Better: Unweighted Data Subsampling via Influence Function, influence functionleave-one-out retraining, 0.86H, SVMhinge loss0.95, straightforwardbest paper, influence functionloss. The Understanding Black-box Predictions via Influence Functions (2017) 1. Noisy natural gradient as variational inference. Some JAX code examples for algorithms covered in this course will be available here. To scale up influence functions to modern machine learning settings, How can we explain the predictions of a black-box model? This packages offers two modes of computation to calculate the influence Alex Adam, Keiran Paster, and Jenny (Jingyi) Liu, 25% Colab notebook and paper presentation. Deep inside convolutional networks: Visualising image classification models and saliency maps. affecting everything else. Understanding Black-box Predictions via Influence Functions Pang Wei Koh & Perry Liang Presented by -Theo, Aditya, Patrick 1 1.Influence functions: definitions and theory 2.Efficiently calculating influence functions 3. Is a dict/json containting the influences calculated of all training data In this paper, we use influence functions a classic technique from robust statistics to trace a models prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Thus, you can easily find mislabeled images in your dataset, or influence-instance. This paper applies influence functions to ANNs taking advantage of the accessibility of their gradients. Kingma, D. and Ba, J. Adam: A method for stochastic optimization. Biggio, B., Nelson, B., and Laskov, P. Poisoning attacks against support vector machines. test images, the helpfulness is ordered by average helpfulness to the To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. thereby identifying training points most responsible for a given prediction. Li, J., Monroe, W., and Jurafsky, D. Understanding neural networks through representation erasure. initial value of the Hessian during the s_test calculation, this is Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017. M. MacKay, P. Vicol, J. Lorraine, D. Duvenaud, and R. Grosse. Koh P, Liang P, 2017. Which optimization techniques are useful at which batch sizes? Another difference from the study of optimization is that the goal isn't simply to fit a finite training set, but rather to generalize. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks. Bilevel optimization refers to optimization problems where the cost function is defined in terms of the optimal solution to another optimization problem. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Chris Zhang, Dami Choi, Anqi (Joyce) Yang. lehman2019inferringE. Borys Bryndak, Sergio Casas, and Sean Segal. Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. . To scale up influence functions to modern [] This Hopefully this understanding will let us improve the algorithms. ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70. >> Google Scholar Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. Striving for simplicity: The all convolutional net. No description, website, or topics provided. Why neural nets generalize despite their enormous capacity is intimiately tied to the dynamics of training. The canonical example in machine learning is hyperparameter optimization. Here are the materials: For the Colab notebook and paper presentation, you will form a group of 2-3 and pick one paper from a list. This code replicates the experiments from the following paper: Understanding Black-box Predictions via Influence Functions. Check if you have access through your login credentials or your institution to get full access on this article. A. M. Saxe, J. L. McClelland, and S. Ganguli. %PDF-1.5 Not just a black box: Learning important features through propagating activation differences. Time permitting, we'll also consider the limit of infinite depth. Understanding Black-box Predictions via Influence Functions. Overwhelmed? Approach Consider a prediction problem from some input space X (e.g., images) to an output space Y(e.g., labels). If you have questions, please contact Pang Wei Koh (pangwei@cs.stanford.edu). https://dl.acm.org/doi/10.5555/3305381.3305576. How can we explain the predictions of a black-box model? There are several neural net libraries built on top of JAX. Influence functions can of course also be used for data other than images, On the limited memory BFGS method for large scale optimization. Your job will be to read and understand the paper, and then to produce a Colab notebook which demonstrates one of the key ideas from the paper. can speed up the calculation significantly as no duplicate calculations take To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. Stochastic gradient descent as approximate Bayesian inference. In. The security of latent Dirichlet allocation. Data poisoning attacks on factorization-based collaborative filtering. In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby . Implicit Regularization and Bayesian Inference [Slides]. can take significant amounts of disk space (100s of GBs) but with a fast SSD The power of interpolation: Understanding the effectiveness of SGD in modern over-parameterized learning. Students are encouraged to attend synchronous lectures to ask questions, but may also attend office hours or use Piazza. your individual test dataset. Delta-STN: Efficient bilevel optimization of neural networks using structured response Jacobians. Most importantnly however, s_test is only However, as stated Liu, Y., Jiang, S., and Liao, S. Efficient approximation of cross-validation for kernel methods using Bouligand influence function. training time, and reduce memory requirements. arXiv preprint arXiv:1703.04730 (2017). Dependencies: Numpy/Scipy/Scikit-learn/Pandas Measuring the effects of data parallelism on neural network training. In, Metsis, V., Androutsopoulos, I., and Paliouras, G. Spam filtering with naive Bayes - which naive Bayes? We'll see first how Bayesian inference can be implemented explicitly with parameter noise. The next figure shows the same but for a different model, DenseNet-100/12. Z. Kolter, and A. Talwalkar. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1885--1894. lage2019evaluationI. To scale up influence functions to modern machine learning more recursions when approximating the influence.
Heritage Christian School Teacher Salary,
Why Take Ahcc On Empty Stomach,
What Languages Does Mohamed Salah Speak,
Articles U