Notes on Bitnet: Scaling 1 Bit Transformers for Large Language Models

Disclaimer: This is part of my notes on AI research papers. I do this to learn and communicate what I understand. Feel free to comment if you have any suggestion, that would be very much appreciated. The following post is a comment on the paper Bitnet: Scaling 1 Bit Transformers for Large Language Models by Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fang Yang, Ruiping Wang, Yi Wu, and Furu Wei....

April 12, 121211 · 5 min · Àlex Pujol Vidal

Notes on LLaVA-Gemma: Accelerating Multimodal Foundation Models With a Compact Language Model

Disclaimer: This is part of my notes on AI research papers. I do this to learn and communicate what I understand. Feel free to comment if you have any suggestion, that would be very much appreciated. The following post is a comment on the paper LlaVA-Gemma: Accelerating Multimodal Foundation Models With a Compact Language Model by Musashi Hinck, Matthew L. Olson, David Cobbley, Shao-Yen Tseng, and Vasudev Lal. Hinck et. al....

April 11, 11113 · 3 min · Àlex Pujol Vidal

Notes on Auto Encoding Variational Bayes

Disclaimer: This is part of my notes on AI research papers. I do this to learn and communicate what I understand. Feel free to comment if you have any suggestion, that would be very much appreciated. The following post is a comment on the paper Auto Encoding Variational Bayes by Diederik P. Kingma and Max Welling. What introduces their contributions is the following question: How can we perform efficient approximate inference and learning with directed probabilistic models whose continuous latent variables and/or parameters have intractable posterior distributions?...

April 10, 101012 · 6 min · Àlex Pujol Vidal

Notes on SSSE: Efficiently Erasing Samples From Trained Machine Learning Models

Disclaimer: This is part of my notes on AI research papers. I do this to learn and communicate what I understand. Feel free to comment if you have any suggestion, that would be very much appreciated. The following post is a comment on the paper SSSE: Efficiently Erasing Samples From Trained Machine Learning Models by Alexandra Peste, Dan Alistarh, and Christoph H. Lampert. Peste et. al. propose Single-Step Sample Erasure (SSSE), a method to efficiently and effectively erase samples from trained machine learning models....

April 10, 101012 · 4 min · Àlex Pujol Vidal

Notes on Mixed-Privacy Forgetting in Deep Networks

Disclaimer: This is part of my notes on AI research papers. I do this to learn and communicate what I understand. Feel free to comment if you have any suggestion, that would be very much appreciated. The following post is a comment on the paper Mixed-Privacy Forgetting in Deep Networks by Aditya Golatkar, Alessandro Achille, Avinash Ravichandran, Marzia Polito, and Stefano Soatto. Golatkar et. al. introduce a novel method for forgetting in a mixed-privacy setting, where a core subset of the training samples will not be forgotten....

April 9, 90910 · 4 min · Àlex Pujol Vidal