When trying to use glmnet, probably the most popular R package for regularized regression, to do weighted lasso estimation, I struggled a lot on penalty.factor in the glmnet() function. After doing a series of experiments, I finally understood how it would work and how it affects the lambda sequence. Hope this article can help clarify your confusion as well.

Background Knowledge

Firstly, let me introduce some background knowledge. The main goal of penalty.factor is to allow different shrinkage on different betas and hence it can be used to perform weighted lasso. And a lambda sequence is used to get the solution path…

D-separation is a critical idea in Bayesian Networks and causal inference. The problem it intends to tackle is: given a causal graph G, is a set X of variables independent of another set Y, given a third set Z?

At the first sight, it may look intimidating. However, with a few examples, you shall able to understand it better. For this article, I mainly focus on the application aspect and will talk about:

  1. three rules to check d-separation in the corresponding scenarios;
  2. one step-by-step algorithm to check d-separation in general;
  3. how to use R package bnlearn to check d-separation.

Three Scenarios and Three Rules


The original paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.


This paper is more engineering-oriented than method-oriented in my opinion. It doesn’t propose new model architectures or training techniques. Yet the contribution of this paper is tremendous. With the goal of investigating the exact contribution of various architectures, training objectives, techniques, and training datasets on transfer learning in NLP, the authors perform a series of systematic experiments and show us the optimal and promising strategies to consider empirically. Afterwards, they combine all of their findings and propose a pre-trained model T5 and the dataset C4. …

1. Official Documentation

First, note that scatter_() is an inplace function, meaning that it will change the value of input tensor.

The official document scatter_(dim, index, src) → Tensor tells us that parameters include the dim, index tensor, and the source tensor. dim specifies where the index tensor is functioning, and we will keep the other dimensions unchanged. And as the function name suggests, the goal is to scatter values in the source tensor to the input tensor self. …

Yu Yang

A Ph.D. student in Statistics and NLP.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store