diff --git a/_posts/2025-08-17-cogen-motion.md b/_posts/2025-08-17-cogen-motion.md
new file mode 100644
index 0000000..a11381e
--- /dev/null
+++ b/_posts/2025-08-17-cogen-motion.md
@@ -0,0 +1,209 @@
+---
+layout: distill
+title: Conditional Generative Models for Motion Prediction
+description: In this blog post, we discuss good engineering practices and the lessons learned—sometimes the hard way—from building conditional generative models (in particular, flow matching) for motion prediction problems.
+tags: motion-prediction trajectory generative-models
+giscus_comments: true
+date: 2025-08-17
+featured: true
+
+authors:
+ - name: Qi Yan
+ url: "https://qiyan98.github.io/"
+ affiliations:
+ name: UBC
+ - name: Yuxiang Fu
+ url: "https://felix-yuxiang.github.io/"
+ affiliations:
+ name: UBC
+
+bibliography: 2025-08-17-cogen-motion.bib
+
+
+# Optionally, you can add a table of contents to your post.
+# NOTES:
+# - make sure that TOC names match the actual section names
+# for hyperlinks within the post to work correctly.
+# - we may want to automate TOC generation in the future using
+# jekyll-toc plugin (https://github.com/toshimaru/jekyll-toc).
+toc:
+ - name: Introduction
+ # if a section has subsections, you can add them as follows:
+ # subsections:
+ # - name: Example Child Subsection 1
+ # - name: Example Child Subsection 2
+ - name: Challenges of Multi-Modal Prediction
+ - name: Engineering Practices and Lessons
+ subsections:
+ - name: Data-Space Predictive Learning Objectives
+ - name: Joint Multi-Modal Learning Losses
+ - name: Exploring Inference Acceleration
+ - name: Summary
+
+# Below is an example of injecting additional post-specific styles.
+# If you use this post as a template, delete this _styles block.
+# _styles: >
+# .fake-img {
+# background: #bbb;
+# border: 1px solid rgba(0, 0, 0, 0.1);
+# box-shadow: 0 0px 4px rgba(0, 0, 0, 0.1);
+# margin-bottom: 12px;
+# }
+# .fake-img p {
+# font-family: monospace;
+# color: white;
+# text-align: left;
+# margin: 12px 0;
+# text-align: center;
+# font-size: 16px;
+# }
+---
+
+
+
+## Introduction
+
+Needless to say, diffusion-based generative models (equivalently, flow matching models) are amazing inventions. They have shown great capacity to produce high-quality images, videos, audios and more, whether being unconditional on the benchmark datasets and conditioned on certain content in the wild.
+In this blog, we discuss a relatively less explored application of **generative models for motion prediction**, which is a fundamental problem in many applications such as autonomous driving and robotics.
+
+In a nutshell, motion prediction is the task of predicting the future trajectories of objects given their past trajectories, plus all sorts of available context information such as surrounding objects and high-fidelity maps.
+The said pipeline implemented by neural networks is simply:
+```
+Past Trajectory + Context Information ---> Neural Network ---> Future Trajectory
+```
+
+To produce meaningful future trajectories, we condition the generative models on the past trajectory and the context information.
+Borrowed from our paper , the pipeline looks like this:
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/cogen-motion/noise_to_traj_moflow.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ The pipeline of motion prediction using conditional (denoising) generative models .
+
+
+The early datasets on human motion prediction mostly do not come with heavy context information, such as the well-known ETH-UCY and the SDD datasets (see more for summarization at ), which the above figure accurately depicts.
+However, modern industry-standard datasets such as the Waymo Open Motion Dataset and the Argoverse series datasets come with much richer context information, such as high-fidelity maps and other rich context information, which need more compute to process. No matter how complex the context information is, the generative model must be guided to **produce spatially and temporally coherent trajectories consistent with the past**.
+
+## Challenges of Multi-Modal Prediction
+
+Motion *prediction*, as the name suggests, is inherently a forecasting task. For each input in a dataset, only one realization of the future motion is recorded, even though multiple plausible outcomes often exist. This mismatch between the inherently **multi-modal** nature of future motion and the **single ground-truth** annotation poses a core challenge for evaluation.
+
+In practice, standard metrics require models to output multiple trajectories, which are then compared against the observed ground truth. For example, **ADE (Average Displacement Error)** and **FDE (Final Displacement Error)** measure trajectory errors, and the minimum ADE/FDE across predictions is typically reported. This setup implicitly encourages models to produce diverse hypotheses, but only rewards the one closest to the recorded future. Datasets such as Waymo Open Motion and Argoverse extend evaluation with metrics targeting uncertainty calibration. For instance, Waymo’s **mAP** rewards models that assign higher confidence to trajectories closer to the ground truth.
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/cogen-motion/vehicle_1_trajflow.gif" class="img-fluid rounded z-depth-1" %}
+
+
+
+ Multi-modal trajectory forecasting made by TrajFlow on the Waymo Open Motion Dataset . Multiple predictions are visualized using different colors, while the single ground truth is shown in red.
+
+
+The strong dependency of current evaluation metrics on a single ground truth, assessed instance by instance, poses a particular challenge for generative models. Although the task inherently requires generating diverse trajectories, models are only rewarded when one of their outputs happens to align closely with the recorded ground truth.
+
+As a result, the powerful ability of generative models to produce diverse samples from noise does not necessarily translate into better performance under current metrics. For example, MotionDiffuser , a diffusion-based model that generates one trajectory at a time, requires a complex post-processing pipeline—ranging from likelihood-based filtering to hand-crafted attractor/repeller cost functions and non-maximum suppression (NMS) for outlier removal—in order to achieve reasonably good results.
+
+## Engineering Practices and Lessons
+
+Now let's dive into the technical side of the problem.
+In the forward process of flow matching, we adopt a simple linear interpolation between the clean trajectories $$Y^1 \sim q$$, where $$q$$ is the data distribution, and pure Gaussian noise $$Y^0 \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$$:
+
+$$
+Y^t = (1-t)Y^0 + tY^1 \qquad t \in [0, 1].
+$$
+
+The reverse process, which allows us to generate new samples, is governed by the ordinary differential equations (ODEs):
+
+$$
+\mathrm{d} Y^t = v_\theta(Y^t, t, C)\mathrm{d}t,
+$$
+
+where $$v_\theta$$ is the parametrized vector field approximating the straight flow $$U^t = Y^1 - Y^0$$. Here, $$C$$ denotes the aggregated contextual information of agents in a scene, including the past trajectory and any other available context information.
+
+### Data-Space Predictive Learning Objectives
+
+From an engineering standpoint, a somewhat **bitter lesson** we encountered is that **existing predictive learning objectives remain remarkably strong**. Despite the appeal of noise-prediction formulations (e.g., $\epsilon$-prediction introduced in DDPM and later adopted in flow matching ), straightforward predictive objectives in the data space—such as direct $$\hat{x}_0$$ reconstruction in DDPM notation
+Note that we follow the flow matching notations in to use $t=1$ as the data distribution and $t=0$ as the noise distribution, which is opposite to the original DDPM notations in .—consistently yields more stable convergence.
+
+
+Concretely, by rearranging the original linear flow objective, we define a neural network
+
+$$
+D_\theta := Y^t + (1-t)v_\theta(Y^t, C, t),
+$$
+
+which is trained to recover the future trajectory $$Y^1$$ in the data space. The corresponding objective is:
+
+$$
+\mathcal{L}_{\text{FM}} = \mathbb{E}_{Y^t, Y^1 \sim q, \, t \sim \mathcal{U}[0,1]} \left[ \frac{\| D_{\theta}(Y^t, C, t) - Y^1 \|_2^2}{(1 - t)^2} \right].
+$$
+
+Our empirical observation is that data-space predictive learning objectives outperform denoising objectives. We argue that this is largely influenced by the current evaluation protocol, which heavily rewards model outputs that are close to the ground truth.
+
+During training, the original denoising target matches the vector field $Y^1 - Y^0$, defined as the difference between the data sample (future trajectory) and the noise sample (drawn from the noise distribution). Under the current proximity-based metrics, this objective is harder to optimize than the predictive objective because of the stochasticity introduced by $Y^0$, as the metrics do not adequately reward diverse forecasting. Moreover, during the sampling process, small errors in the vector field model $v_\theta$—measured with respect to the single ground-truth velocity field at intermediate time steps—can be amplified through subsequent iterative steps. Consequently, increasing inference-time compute may not necessarily improve results without incorporating regularization from the data-space loss
+Interestingly, in our experiments, we found that flow-matching ODEs—thanks to their less noisy inference process—usually perform more stably than diffusion-model SDEs, which is surprising. In image generation, as shown in SiT , ODE-based samplers are generally weaker than SDE-based samplers.
+.
+
+### Joint Multi-Modal Learning Losses
+
+Building on this, another key engineering practice was to introduce **joint multi-modal learning losses**. Our network $$D_\theta$$ generates $$K$$ scene-level correlated waypoint predictions $$\{S_i\}_{i=1}^K$$ along with classification logits $$\{\zeta_i\}_{i=1}^K$$
+Usually, different datasets have different conventions for what a proper $K$ should be. For example, $K=20$ is used for the ETH-UCY dataset, while $K=6$ is used for the Waymo Open Motion Dataset .
+. This allows us to capture diverse futures in a single inference loop while still grounding learning in a predictive loss.
+Such a principle of combined regression and classification losses to encourage trajectory multi-modality is ubiquitous in the motion prediction literature, as seen in MTR , UniAD , and QCNet , though these methods differ in other implementation details.
+For simplicity, we omit the time-dependent weighting and define the multi-modal flow matching loss:
+
+$$
+\bar{\mathcal{L}}_{\text{FM}} = \mathbb{E}_{Y^t, Y^1 \sim q, \, t \sim \mathcal{U}[0,1]} \left[ \| S_{j^*} - Y^1 \|_2^2 + \text{CE}(\zeta_{1:K}, j^*) \right],
+$$
+
+where $$j^* = \arg\min_{j} \| S_j - Y^1 \|_2^2$$ indicates the closest waypoint to the ground-truth trajectory and $$\text{CE}(\cdot,\cdot)$$ denotes cross-entropy. On tasks where confidence calibration is important, such as those measured by the mAP metric in the Waymo Open Motion Dataset, we refer readers to our paper for further details on uncertainty calibration.
+
+We acknowledge that some prior works, such as MotionLM and MotionDiffuser , generate one trajectory at a time and have demonstrated strong performance. However, since these methods are not open-sourced, we are unable to conduct direct comparisons or measure their runtime efficiency. We conjecture that requiring multiple inference loops (tens to hundreds) is considerably slower than our one-step generator—particularly on smaller-scale datasets, where the one-step approach achieves comparable performance without significant degradation.
+
+## Exploring Inference Acceleration
+
+To accelerate inference in flow-matching models, which typically require tens or even hundreds of iterations for ODE simulation, we adopt an underrated idea from the image generation literature: conditional **IMLE (implicit maximum likelihood estimation)** . IMLE provides a way to distill an iterative generative model into a **one-step generator**.
+
+The IMLE family consists of generative models designed to produce diverse samples in a single forward pass, conceptually similar to the generator in GANs . In our setting, we construct a conditional IMLE model that takes the same context $$C$$ as the teacher flow-matching model and learns to match the teacher’s motion prediction results directly in the data space.
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/cogen-motion/imle_moflow.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ Pipeline of the IMLE distillation process in our work .
+
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/cogen-motion/IMLE_algorithm.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+The IMLE distillation process is summarized in `Algorithm 1`. Lines 4–6 describe the standard ODE-based sampling of the teacher model, which produces $K$ correlated multi-modal trajectory predictions $$\hat{Y}^1_{1:K}$$ conditioned on the context $C$. A conditional IMLE generator $G_\phi$ then uses a noise vector $Z$ and context $C$ to generate $K$-component trajectories $\Gamma$, matching the shape of $$\hat{Y}^1_{1:K}$$.
+
+Unlike direct distillation, the conditional IMLE objective generates **more** samples than those available in the teacher’s dataset for the same context $C$. Specifically, $m$ i.i.d. samples are drawn from $G_\phi$, and the one closest to the teacher prediction $$\hat{Y}^1_{1:K}$$ is selected for loss computation. This nearest-neighbor matching ensures that the teacher model’s modes are faithfully captured.
+
+To preserve trajectory multi-modality, we employ the Chamfer distance $d_{\text{Chamfer}}(\hat{Y}^1, \Gamma)$ as our loss function:
+
+$$
+\mathcal{L}_{\text{IMLE}}(\hat{Y}^1_{1:K}, \Gamma) = \dfrac{1}{K} \left( \sum_{i=1}^K \min_j \|\hat{Y}^1_i - \Gamma^{(j)}\| + \sum_{j=1}^K \min_i \|\hat{Y}^1_i - \Gamma^{(j)}\| \right)
+$$
+
+where $\Gamma^{(i)} \in \mathbb{R}^{A \times 2T_f}$ is the $i$-th component of the IMLE-generated correlated trajectory.
+
+Nonetheless, the acceleration of diffusion-based models—particularly through distillation—is evolving rapidly. Our work with IMLE is just one attempt in this direction, and we are actively exploring further improvements to extend its applicability to broader domains.
+
+
+## Summary
+
+We reviewed the challenges and engineering insights gained from developing conditional generative models for motion prediction, primarily drawing on our previous works . The task requires generating diverse trajectories, yet common evaluation metrics such as ADE and FDE primarily reward alignment with a single ground-truth trajectory.
+
+From these experiences, we identified two useful engineering practices:
+- Data-space predictive learning objectives outperform denoising-based approaches, leading to more stable convergence.
+- Joint multi-modal learning losses that integrate regression and classification more effectively capture trajectory diversity.
+
+In addition, we explored the IMLE distillation technique to accelerate inference by compressing iterative processes into a one-step generator, while preserving multi-modality through Chamfer distance losses.
diff --git a/_posts/2025-08-18-gen-graph.md b/_posts/2025-08-18-gen-graph.md
new file mode 100644
index 0000000..4d2bfa9
--- /dev/null
+++ b/_posts/2025-08-18-gen-graph.md
@@ -0,0 +1,363 @@
+---
+layout: distill
+title: On the Permutation Invariance of Graph Generative Models
+description: This blog post discusses the permutation invariance principle of graph generative models, which has often been taken for granted in graph-related tasks. While permutation symmetry is an elegant property of graph data, there is still more to learn about its empirical implications.
+tags: graph permutation-invariance generative-models
+giscus_comments: true
+date: 2025-08-18
+featured: true
+
+authors:
+ - name: Qi Yan
+ url: "https://qiyan98.github.io/"
+ affiliations:
+ name: UBC
+
+bibliography: 2025-08-18-gen-graph.bib
+
+
+# Optionally, you can add a table of contents to your post.
+# NOTES:
+# - make sure that TOC names match the actual section names
+# for hyperlinks within the post to work correctly.
+# - we may want to automate TOC generation in the future using
+# jekyll-toc plugin (https://github.com/toshimaru/jekyll-toc).
+toc:
+ - name: Introduction
+ # if a section has subsections, you can add them as follows:
+ # subsections:
+ # - name: Example Child Subsection 1
+ # - name: Example Child Subsection 2
+ - name: Permutation Symmetry for Graph Representation Learning
+ - name: Rethinking Invariance Principle for Generative Models
+ subsections:
+ - name: Invariance of Probability Distributions
+ - name: Implications on Scalability
+ - name: Post-processing to Reclaim Sample Invariance
+ - name: Additional Experimental Results
+ - name: Discussion with Recent Works
+ - name: Summary
+
+# Below is an example of injecting additional post-specific styles.
+# If you use this post as a template, delete this _styles block.
+# _styles: >
+# .fake-img {
+# background: #bbb;
+# border: 1px solid rgba(0, 0, 0, 0.1);
+# box-shadow: 0 0px 4px rgba(0, 0, 0, 0.1);
+# margin-bottom: 12px;
+# }
+# .fake-img p {
+# font-family: monospace;
+# color: white;
+# text-align: left;
+# margin: 12px 0;
+# text-align: center;
+# font-size: 16px;
+# }
+---
+
+
+
+## Introduction
+
+Graphs are ubiquitous mathematical objects that arise in various domains, such as social networks, protein structures, and chemical molecules.
+A graph can be formally represented as a set of nodes $V$, optionally associated with node features $X$, and a set of edges $E$.
+Its topology is commonly expressed using an adjacency matrix $A$.
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/icml_workshop_graph_3.png" class="img-fluid rounded z-depth-1" %}
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/icml_workshop_graph_4.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ Plain graphs composed of nodes and edges (visuals taken from ).
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/molecule_jin_20_icml.png" class="img-fluid rounded z-depth-1" %}
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/scene_graph_xu_17_cvpr.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ Molecule graphs and scene graphs with node and edge features.
+
+
+
+The goal of graph generative models is to generate novel graph samples that resemble those drawn from the true data distribution.
+Graphs can be stored in several ways in computational systems, including tree-based structures , SMILES strings (molecule graphs) , and more commonly, adjacency matrices .
+In this post, we focus on the adjacency matrix representation, which is widely used due to its flexibility.
+Formally, graph generative models learn a distribution $p_\theta$ and generate adjacency matrices $\hat{A} \sim p_\theta$.
+
+## Permutation Symmetry for Graph Representation Learning
+
+Permutation symmetry (i.e., permutation invariance and equivariance) is a fundamental design principle in graph representation learning tasks. Below, we provide a quick recap of the key ideas using resources from .
+
+A straightforward idea for constructing a deep neural network on graphs is to use the adjacency matrix directly as input. For instance, to obtain an embedding of the entire graph, one could flatten the adjacency matrix and pass it through a multi-layer perceptron (MLP):
+
+$$
+\mathbf{z}_G = \text{MLP}({A}[1] \oplus {A}[2] \oplus \dots \oplus {A}[ \vert V \vert]),
+$$
+
+where ${A}[i] \in \mathbb{R}^{\vert V \vert}$ corresponds to the $i$-th row of the adjacency matrix, and $\oplus$ denotes vector concatenation. The drawback of this method is that it relies on the arbitrary ordering of the nodes in the adjacency matrix. Consequently, such a model is not permutation invariant.
+
+A fundamental requirement for graph representation learning models is that they should exhibit permutation invariance (or equivariance). Formally, a function $f$ that processes an adjacency matrix ${A}$ should ideally satisfy one of the following conditions:
+
+$$
+\begin{aligned}
+f({PAP}^\top) &= f({A}) \quad \text{(Permutation Invariance)}
+\\
+f({PAP}^\top) &= {P} f({A}) P^\top \quad \text{(Permutation Equivariance)}
+\end{aligned}
+$$
+
+where ${P} \in \mathbb{R}^{\vert V \vert \times \vert V \vert}$ represents a permutation matrix .
+
+- **Permutation invariance** implies that the function’s output does not change with different node orderings in the adjacency matrix.
+- **Permutation equivariance** means that permuting the adjacency matrix results in a correspondingly permuted output of $f$.
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/icml_workshop_graph_task_5.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ Common graph representation learning tasks include node-level, edge-level, and graph-level tasks (shown from left to right; image credit: ).
+
+Specifically, node- and edge-level tasks require permutation equivariance, while graph-level tasks require permutation invariance. A well-designed Graph Neural Network (GNN) should generalize across arbitrary permutations of node and edge indices. This ensures that outputs are symmetric with respect to node ordering in the input graph data. It has two key advantages: (i) during training, the model avoids the need to consider all possible node orderings (exhaustive data augmentation), thereby reducing learning complexity and enjoying theoretical benefits ; and (ii) at inference, the model generalizes to unseen graphs without ordering bias, since any permutation is handled consistently.
+
+## Rethinking Invariance Principle for Generative Models
+
+For graph generative models, permutation symmetry boils down to the invariance of the learned probability distribution. Namely, $p_\theta(\hat{A}) = p_\theta(P \hat{A} P^\top)$ must hold for any valid permutation matrix $P$. Two adjacency matrices related by such a permutation are said to be **isomorphic** . Thus, permutation-invariant graph generative models assign equal probability to any graph belonging to the same isomorphism class. One of the pioneering works that achieves this property is , in the context of score-based (a.k.a. diffusion) generative models.
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/prob_perm_invar.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ Permutation invariance of probability distributions means the likelihood is the same regardless of the node ordering. For example, $p_\theta({A}^{\pi_1}) = p_\theta({A}^{\pi_2}) = p_\theta({A}^{\pi_3}) = p_\theta({A}^{\pi_4}) = \cdots $.
+
+
+### Invariance of Probability Distributions
+
+> **Theorem 1 (from ).**
+> If $ \mathbf{s} : \mathbb{R}^{N \times N} \to \mathbb{R}^{N \times N} $ is a permutation equivariant function, then the scalar function $ f_s = \int_{\gamma[\mathbf{0}, {A}]} \langle \mathbf{s}({X}), \mathrm{d}{X} \rangle_F + C $ is permutation invariant, where $ \langle {A}, {B} \rangle_F = \mathrm{tr}({A}^\top {B}) $ is the Frobenius inner product, $ \gamma[\mathbf{0}, {A}] $ is any curve from $ \mathbf{0} = \\{0\\}\_{N \times N} $ to $ {A} $, and $ C \in \mathbb{R} $ is a constant.
+
+We refer readers to for proof details. The implication of this theorem is that the scalar function $f_s$ can characterize the probability density $p_\theta(\cdot)$ up to a constant.
+This happens when the vector-valued function
+$\mathbf{s}(\cdot)$
+represents the gradient of the log-likelihood, commonly called the (Stein) score function in the context of generative models .
+
+Thus, if the learned gradient of the log-likelihood
+$\mathbf{s}\_\theta(A) = \nabla\_{A} \log p\_\theta(A)$
+is permutation equivariant, then the implicitly defined log-likelihood function
+$\log p_\theta(A)$
+is permutation invariant, according to Theorem 1, as given by the line integral of
+$\mathbf{s}_\theta(A)$:
+
+$$
+\log p_\theta({A}) = \int_{\gamma[0, {A}]} \langle \mathbf{s}_\theta({X}), \mathrm{d} {X} \rangle_F + \log p_\theta(\mathbf{0}).
+$$
+
+Based on this theorem, as long as the score estimation neural network (equivalently, diffusion denoising network) is permutation equivariant, we can prove that the learned probability distribution is permutation invariant. Later works such as GDSS and DiGress also follow this idea.
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/graph_diffusion.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ The diffusion processes for graph generation using adjacency matrices in the continuous state space.
+
+
+### Implications on Scalability
+
+While it is appealing to design a provably permutation invariant graph generative model, we find the empirical implications of this property are not as straightforward as one might expect.
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/distribution.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ Data distribution and target distribution for a 3-node tree graph. For permutation matrix $P_i$ and adjacency matrix $A_i$, filled/blank cells mean one/zero.
+ The probability mass function (PMF) highlights the difference in modes. Our example here also shows graph automorphism (e.g., $P_1$ and $P_2$).
+
+
+#### Theoretical Analysis
+
+Here we provide a simple example to illustrate the potential challenges of learning permutation invariant graph generative models.
+As shown in the above figure, the **empirical graph distribution**
+$p_{\text{data}}$ (i.e., the observed training samples)
+may only assign a non-zero probability to a single observed adjacency matrix in its isomorphism class.
+
+The ultimate goal of a graph generative model is to match this empirical distribution,
+which may be biased by the observed permutation. However, the **target distribution** that generative models are trained to match may differ from the empirical one, depending on the model design w.r.t. permutation symmetry.
+
+For clarity, we define the **effective target distribution** as the closest distribution
+(e.g., measured in total variation distance) to the empirical data distribution achievable
+by the generative model, assuming sufficient data and model capacity.
+
+Formally, given a training set of adjacency matrices
+$\\{A_i\\}\_{i=1}^m$ with $N$ nodes
+, we define the union of their isomorphism classes as
+$\mathcal{A}^* = \bigcup_{i=1}^m \mathcal{I}_{A_i}$.
+Each isomorphism class $\mathcal{I}\_{A\_i}$ represents all adjacency matrices that are topologically equivalent to $A\_i$ but may have different matrix representations.
+The corresponding effective target distribution distribution is
+
+$$p_{\text{data}}^*(A) = \frac{1}{Z} \sum_{A^* \in \mathcal{A}^*} \delta(A - A^*),$$
+
+where $Z = \vert \mathcal{A}^* \vert = O(n!m)$ is the normalizing constant. Note that $Z = n!m$ may not always be achievable due to graph automorphisms .
+
+> **Lemma 2 (from ).** Assume at least one training graph has $\Omega(n!)$ distinct adjacency matrices in its isomorphism class. Let $\mathcal{P}$ denote all discrete permutation-invariant distributions. The closest distributions in $\mathcal{P}$ to $p_{\text{data}}$, measured by total variation, have at least $\Omega(n!)$ modes. If, in addition, we restrict $\mathcal{P}$ to be the set of permutation-invariant distributions such that $p(A_i) = p(A_j) > 0$ for all matrices in the training set $\\{A_l\\}_{l=1}^m$, then the closest distribution is $\arg\min\_{q \in \mathcal{P}} TV(q, p\_{\text{data}}) = p\_{\text{data}}^*.$
+
+Under mild conditions, $p\_{\text{data}}^*(A)$ with $O(n!m)$ modes, which is defined above, becomes the effective target distribution, which is the case for permutation invariant models using equivariant networks. In contrast, if we employ a non-equivariant network (i.e., the learned density is not invariant), the effective target distribution becomes $p\_{\text{data}}(A)$, which only has $O(m)$ modes. While we discuss the number of modes from a general perspective, the analysis is also relevant to diffusion models
+The modes of the Dirac delta target distributions determine the components of the Gaussian mixture models (GMMs) in diffusion models, with each component centered exactly on a target mode.
+For an invariant model, the GMMs take the form
+$p_\sigma^*(A) = \frac{1}{Z} \sum_{A^*_i \in \mathcal{A}^*}
+\mathcal{N}(A; A_i, \sigma^2\mathbf{I}),$
+which has an $O(n!)$ factor more components than the non-invariant one. Thus, learning with a permutation-invariant principle is arguably harder than with a non-invariant one, due to the $O(n!)$ surge in both the modes of the target distribution and the number of GMM components at various noise scales.
+.
+
+
+#### Empirical Investigation
+
+In practice, we typically observe only one adjacency matrix from each isomorphism class
+in the training data $\\{A_i\\}_{i=1}^m$. By applying permutation $n!$ times, one can construct $p\_{\text{data}}^*$ (invariant distribution) from $p\_{\text{data}}$ (non-invariant distribution).
+
+We define a trade-off distribution, called the **$l$-permuted empirical distribution**:
+
+$$
+p_{\text{data}}^l(A) = \frac{1}{ml} \sum_{i=1}^{m} \sum_{j=1}^{l}
+\delta(A - P_j A_i P_j^{\top}),
+$$
+
+where $P_1, \ldots, P_l$ are $l$ distinct permutation matrices.
+The construction of $p\_{\text{data}}^l$ has the following properties: (1) $p\_{\text{data}}^l$ has $O(lm)$ modes governed by $l$; (2) With proper permutations, $p\_{\text{data}}^l = p\_{\text{data}}$ when $l=1$; (3) $p\_{\text{data}}^l \approx p\_{\text{data}}^*$ when $l=n!$ (identical if no non-trivial automorphisms). We use $p\_{\text{data}}^l$ as the diffusion model’s target by tuning $l$ to study the impact of
+mode count on empirical performance.
+
+The experimental setup is as follows. We use 10 random regular graphs with 16 nodes, with degrees in $[2, 11]$. The parameter $l$ ranges from 1 to 500, and all models are trained to convergence. For baselines, we consider **invariant models** for $p_{\text{data}}^*$: DiGress and PPGN (a highly-expressive 3WL-discriminative GNN
+The original PPGN model was proposed for graph representation learning. We reimplemented it based on the official codebase to adapt to the diffusion objective.
+). For **non-invariant models** (corresponding to $p_{\text{data}}^l, \, l < n!$), we evaluate PPGN with index-based positional embeddings and SwinGNN (ours) . We measure **recall**, defined as the proportion of generated graphs that are isomorphic to any training graph. Recall lies in the range of $[0,1]$, requires isomorphism testing for computation, and is permutation-invariant on its own. A higher recall indicates a stronger ability to capture the toy data distribution.
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/novelty_dataset2.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ Quantitative results on the $l$-permuted empirical distribution.
+ Invariant models perform significantly worse than non-invariant models when the number of applied permutations ($l$) is small.
+
+
+
+As shown in the plot above, invariant models such as DiGress and PPGN consistently fail to achieve high recall. In contrast, non-invariant models perform exceptionally well when $l$ is small, where only a few permutations are imposed, but their sample quality degrades significantly as $l$ increases, reflecting the difficulty of learning from distributions with many modes. Importantly, in practice, one typically sets $l=1$ for non-invariant models, which often leads to empirically stronger performance than their invariant counterparts.
+
+
+### Post-processing to Reclaim Sample Invariance
+
+
+While non-invariant models often perform better empirically, they cannot guarantee permutation-invariant sampling. To bridge this gap, we propose a simple and provable trick: apply a random permutation to each generated sample, which yields invariant sampling at no extra cost.
+
+
+> **Lemma 3 (from ).**
+> Let $A$ be a random adjacency matrix distributed according to any graph distribution on $n$ vertices. Let $P\_r \sim \mathrm{Unif}(\mathcal{S}\_n)$ be uniform over the set of permutation matrices. Then, the induced distribution of the random matrix $A\_r = P\_r A P\_r^{\top}$, denoted as $q_\theta(A\_r)$, is permutation invariant, i.e., $q_\theta(A\_r) = q_\theta(P A\_r P^{\top}), \forall P \in \mathcal{S}\_n$.
+
+This trick applies to all generative models. Importantly, the random permutation preserves the isomorphism class: $q_\theta$ is invariant but covers the same set of isomorphism classes as $p_\theta$. Thus, graphs from $q_\theta$ always have isomorphic counterparts under $p_\theta$.
+
+In summary, there are two key observations: (1) invariant models ensure invariant sampling but may harm empirical performance, and (2) invariant sampling does not require invariant models.
+
+### Additional Experimental Results
+
+ Motivated by the aforementioned analysis, we design a new diffusion model that combines non-invariant objectives with invariant sampling to better capture graph distributions.
+
+
+ click here for more details on model design
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/model.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ Overall architecture of the proposed model.
+
+ We propose an efficient high-order graph transformer, SwinGNN , for graph diffusion. Drawing inspiration from $k$-order and $k$-WL GNNs , SwinGNN approximates the expressive 2-WL message passing to enhance graph isomorphism testing and function approximation capacity. To address the computational complexity of $O(n^4)$ in 2-WL GNNs, SwinGNN employs a transformer with window-based self-attention , treating edge values as tokens and reducing complexity to $O(n^2M^2)$ by confining attention to local $M \times M$ windows. A shifted window technique further enables cross-window interactions. Additionally, SwinGNN incorporates multi-scale edge representation learning through channel mixing-based downsampling and upsampling layers, constructing hierarchical graph representations to capture long-range interactions efficiently. Experimental results demonstrate its superior performance in graph generation tasks.
+
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/qualitative_result.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ Qualitative comparison between invariant and non-invariant models.
+
+
+From the qualitative results, especially on the grid dataset, we can clearly see the difference between the non-invariant and invariant models, compared with then-SOTA models GRAN , GDSS , and ours.
+
+We also show the models' superiority on the quantitative results on more synthetic and real-world datasets. The metrics are computed by maximum mean discrepancy (MMD) metrics characterizing graph structure properties, such as degree distributions. Please refer to the expandable box below and the original paper for more details.
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/quantitative_result_1.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ Quantitative comparison between invariant and non-invariant models.
+
+
+
+
+ click here for more quantitative results
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/quantitative_result_2.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ More quantitative results on synthetic and real-world graph datasets.
+ Table 2 demonstrates further results on larger graph datasets.
+ Table 3 shows results on molecule datasets using domain-specific metrics.
+
+
+
+ {% include figure.liquid loading="eager" path="blog/2025/gen-graph/diffusesg_2.png" class="img-fluid rounded z-depth-1" %}
+
+
+
+ We apply non-invariant models on the scene graph generation task as well and show consistent scene-graph-to-image pair generation results. Please refer to for more details.
+
+
+
+
+
+## Discussion with Recent Works
+
+It is interesting to see concurrent works adopting similar principles or arrive at similar findings regarding the permutation invariance property for generative models in the graph domain.
+
+For instance, AlphaFold 3 (2024) from DeepMind employs a non-equivariant diffusion model architecture to predict properties of complex proteins. We quote from their paper:
+
+
+ The diffusion module operates directly on raw atom coordinates, and on a coarse abstract token representation, without rotational frames or any equivariant processing.
+
+
+AlphaFold 3 indeed applies data augmentation to encourage equivariance but does not impose this condition through network design.
+
+Similarly, DiffAlign , published in ICLR 2025, also discards the equivariance property for diffusion models on the retrosynthesis task and shows improved performance. From a theoretical perspective, they use copying graphs as a case study to illustrate the limitations of equivariance. Both works provide further empirical evidence that permutation invariance should not simply be taken for granted in generative models.
+
+More interestingly, research from the optimization perspective also addresses this problem and provides fresh insights, studying the relationship between intrinsic equivariance and data augmentation .
+
+
+## Summary
+
+This post examines the role of permutation invariance in **graph generative models**.
+While symmetry is essential in **graph representation learning**, enforcing it in generative models can make learning harder by introducing exponentially many modes.
+Empirical results show that non-invariant models often outperform invariant ones, especially with limited permutations.
+A simple post-processing trick—random permutation—restores invariant sampling without requiring invariant model design.
+Building on this, we propose non-invariant graph generative models that achieve strong performance on synthetic and real-world datasets .
+Recent works like AlphaFold 3 and DiffAlign further support the view that permutation invariance should not be taken for granted for generative models in the graph domain. It appears more rigorous theoretical analysis is needed to understand the relationship between permutation invariance and the performance of generative models.
\ No newline at end of file
diff --git a/assets/bibliography/2024-06-20-fvmd-1.bib b/assets/bibliography/2024-06-20-fvmd-1.bib
index 8f06801..d4979c3 100644
--- a/assets/bibliography/2024-06-20-fvmd-1.bib
+++ b/assets/bibliography/2024-06-20-fvmd-1.bib
@@ -12,10 +12,11 @@ @InProceedings{huang2023vbench
year={2024}
}
-@techreport{liu2024fvmd,
+@article{liu2024fvmd,
title={Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos},
author={Liu, Jiahe and Qu, Youran and Yan, Qi and Zeng, Xiaohui and Wang, Lele and Liao, Renjie},
- year={2024},
+ journal={arXiv preprint arXiv:2407.16124},
+ year={2024}
}
@article{dowson1982frechet,
diff --git a/assets/bibliography/2025-08-17-cogen-motion.bib b/assets/bibliography/2025-08-17-cogen-motion.bib
new file mode 100644
index 0000000..933ab74
--- /dev/null
+++ b/assets/bibliography/2025-08-17-cogen-motion.bib
@@ -0,0 +1,146 @@
+@inproceedings{fu2025moflow,
+ title={Moflow: One-step flow matching for human trajectory forecasting via implicit maximum likelihood estimation based distillation},
+ author={Fu, Yuxiang and Yan, Qi and Wang, Lele and Li, Ke and Liao, Renjie},
+ booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
+ pages={17282--17293},
+ year={2025}
+}
+
+@article{yan2025trajflow,
+ title={TrajFlow: Multi-modal Motion Prediction via Flow Matching},
+ author={Yan, Qi and Zhang, Brian and Zhang, Yutong and Yang, Daniel and White, Joshua and Chen, Di and Liu, Jiachao and Liu, Langechuan and Zhuang, Binnan and Shi, Shaoshuai and others},
+ journal={Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems},
+ year={2025}
+}
+
+@inproceedings{ettinger2021large,
+ title={Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset},
+ author={Ettinger, Scott and Cheng, Shuyang and Caine, Benjamin and Liu, Chenxi and Zhao, Hang and Pradhan, Sabeek and Chai, Yuning and Sapp, Ben and Qi, Charles R and Zhou, Yin and others},
+ booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
+ pages={9710--9719},
+ year={2021}
+}
+
+@inproceedings{chang2019argoverse,
+ title={Argoverse: 3d tracking and forecasting with rich maps},
+ author={Chang, Ming-Fang and Lambert, John and Sangkloy, Patsorn and Singh, Jagjeet and Bak, Slawomir and Hartnett, Andrew and Wang, De and Carr, Peter and Lucey, Simon and Ramanan, Deva and others},
+ booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
+ pages={8748--8757},
+ year={2019}
+}
+
+@article{wilson2023argoverse,
+ title={Argoverse 2: Next generation datasets for self-driving perception and forecasting},
+ author={Wilson, Benjamin and Qi, William and Agarwal, Tanmay and Lambert, John and Singh, Jagjeet and Khandelwal, Siddhesh and Pan, Bowen and Kumar, Ratnesh and Hartnett, Andrew and Pontes, Jhony Kaesemodel and others},
+ journal={arXiv preprint arXiv:2301.00493},
+ year={2023}
+}
+
+@article{ivanovic2023trajdata,
+ title={trajdata: A unified interface to multiple human trajectory datasets},
+ author={Ivanovic, Boris and Song, Guanyu and Gilitschenski, Igor and Pavone, Marco},
+ journal={Advances in Neural Information Processing Systems},
+ volume={36},
+ pages={27582--27593},
+ year={2023}
+}
+
+@article{ho2020denoising,
+ title={Denoising diffusion probabilistic models},
+ author={Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
+ journal={Advances in neural information processing systems},
+ volume={33},
+ pages={6840--6851},
+ year={2020}
+}
+
+@article{lipman2022flow,
+ title={Flow matching for generative modeling},
+ author={Lipman, Yaron and Chen, Ricky TQ and Ben-Hamu, Heli and Nickel, Maximilian and Le, Matt},
+ journal={arXiv preprint arXiv:2210.02747},
+ year={2022}
+}
+
+@inproceedings{jiang2023motiondiffuser,
+ title={Motiondiffuser: Controllable multi-agent motion prediction using diffusion},
+ author={Jiang, Chiyu and Cornman, Andre and Park, Cheolho and Sapp, Benjamin and Zhou, Yin and Anguelov, Dragomir and others},
+ booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
+ pages={9644--9653},
+ year={2023}
+}
+
+@article{shi2022motion,
+ title={Motion transformer with global intention localization and local movement refinement},
+ author={Shi, Shaoshuai and Jiang, Li and Dai, Dengxin and Schiele, Bernt},
+ journal={Advances in Neural Information Processing Systems},
+ volume={35},
+ pages={6531--6543},
+ year={2022}
+}
+
+@inproceedings{hu2023planning,
+ title={Planning-oriented autonomous driving},
+ author={Hu, Yihan and Yang, Jiazhi and Chen, Li and Li, Keyu and Sima, Chonghao and Zhu, Xizhou and Chai, Siqi and Du, Senyao and Lin, Tianwei and Wang, Wenhai and others},
+ booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
+ pages={17853--17862},
+ year={2023}
+}
+
+@inproceedings{zhou2023query,
+ title={Query-centric trajectory prediction},
+ author={Zhou, Zikang and Wang, Jianping and Li, Yung-Hui and Huang, Yu-Kai},
+ booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
+ pages={17863--17873},
+ year={2023}
+}
+
+@inproceedings{ma2024sit,
+ title={Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers},
+ author={Ma, Nanye and Goldstein, Mark and Albergo, Michael S and Boffi, Nicholas M and Vanden-Eijnden, Eric and Xie, Saining},
+ booktitle={European Conference on Computer Vision},
+ pages={23--40},
+ year={2024},
+ organization={Springer}
+}
+
+@article{li2018implicit,
+ title={Implicit maximum likelihood estimation},
+ author={Li, Ke and Malik, Jitendra},
+ journal={arXiv preprint arXiv:1809.09087},
+ year={2018}
+}
+
+@inproceedings{li2019diverse,
+ title={Diverse image synthesis from semantic layouts via conditional imle},
+ author={Li, Ke and Zhang, Tianhao and Malik, Jitendra},
+ booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
+ pages={4220--4229},
+ year={2019}
+}
+
+@article{goodfellow2020generative,
+ title={Generative adversarial networks},
+ author={Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua},
+ journal={Communications of the ACM},
+ volume={63},
+ number={11},
+ pages={139--144},
+ year={2020},
+ publisher={ACM New York, NY, USA}
+}
+
+@inproceedings{fan2017point,
+ title={A point set generation network for 3d object reconstruction from a single image},
+ author={Fan, Haoqiang and Su, Hao and Guibas, Leonidas J},
+ booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
+ pages={605--613},
+ year={2017}
+}
+
+@inproceedings{seff2023motionlm,
+ title={Motionlm: Multi-agent motion forecasting as language modeling},
+ author={Seff, Ari and Cera, Brian and Chen, Dian and Ng, Mason and Zhou, Aurick and Nayakanti, Nigamaa and Refaat, Khaled S and Al-Rfou, Rami and Sapp, Benjamin},
+ booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
+ pages={8579--8590},
+ year={2023}
+}
diff --git a/assets/bibliography/2025-08-18-gen-graph.bib b/assets/bibliography/2025-08-18-gen-graph.bib
new file mode 100644
index 0000000..15bdd15
--- /dev/null
+++ b/assets/bibliography/2025-08-18-gen-graph.bib
@@ -0,0 +1,208 @@
+@misc{icml2024graphs,
+title={Tutorial on graph learning: Principles, challenges, and Open Directions},
+url={https://icml2024graphs.ameyavelingker.com/},
+journal={ICML 2024 Tutorial},
+author={Arnaiz-Rodríguez, Adrián and Velingker, Ameya},
+year={2024}
+}
+
+@inproceedings{jin2020hierarchical,
+ title={Hierarchical generation of molecular graphs using structural motifs},
+ author={Jin, Wengong and Barzilay, Regina and Jaakkola, Tommi},
+ booktitle={International conference on machine learning},
+ pages={4839--4848},
+ year={2020},
+ organization={PMLR}
+}
+
+@inproceedings{xu2017scene,
+ title={Scene graph generation by iterative message passing},
+ author={Xu, Danfei and Zhu, Yuke and Choy, Christopher B and Fei-Fei, Li},
+ booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
+ pages={5410--5419},
+ year={2017}
+}
+
+@inproceedings{shirzad2022td,
+ title={Td-gen: graph generation using tree decomposition},
+ author={Shirzad, Hamed and Hajimirsadeghi, Hossein and Abdi, Amir H and Mori, Greg},
+ booktitle={International Conference on Artificial Intelligence and Statistics},
+ pages={5518--5537},
+ year={2022},
+ organization={PMLR}
+}
+
+@article{weininger1988smiles,
+ title={SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules},
+ author={Weininger, David},
+ journal={Journal of chemical information and computer sciences},
+ volume={28},
+ number={1},
+ pages={31--36},
+ year={1988},
+ publisher={ACS Publications}
+}
+
+@book{hamilton2020graph,
+ title={Graph representation learning},
+ author={Hamilton, William L},
+ year={2020},
+ publisher={Morgan & Claypool Publishers}
+}
+
+@article{lyle2020benefits,
+ title={On the benefits of invariance in neural networks},
+ author={Lyle, Clare and van der Wilk, Mark and Kwiatkowska, Marta and Gal, Yarin and Bloem-Reddy, Benjamin},
+ journal={arXiv preprint arXiv:2005.00178},
+ year={2020}
+}
+
+@article{bietti2021sample,
+ title={On the sample complexity of learning under geometric stability},
+ author={Bietti, Alberto and Venturi, Luca and Bruna, Joan},
+ journal={Advances in neural information processing systems},
+ volume={34},
+ pages={18673--18684},
+ year={2021}
+}
+
+@inproceedings{niu2020permutation,
+ title={Permutation invariant graph generation via score-based generative modeling},
+ author={Niu, Chenhao and Song, Yang and Song, Jiaming and Zhao, Shengjia and Grover, Aditya and Ermon, Stefano},
+ booktitle={International conference on artificial intelligence and statistics},
+ pages={4474--4484},
+ year={2020},
+ organization={PMLR}
+}
+
+@article{song2019generative,
+ title={Generative modeling by estimating gradients of the data distribution},
+ author={Song, Yang and Ermon, Stefano},
+ journal={Advances in neural information processing systems},
+ volume={32},
+ year={2019}
+}
+
+@article{song2020score,
+ title={Score-based generative modeling through stochastic differential equations},
+ author={Song, Yang and Sohl-Dickstein, Jascha and Kingma, Diederik P and Kumar, Abhishek and Ermon, Stefano and Poole, Ben},
+ journal={arXiv preprint arXiv:2011.13456},
+ year={2020}
+}
+
+@inproceedings{jo2022score,
+ title={Score-based generative modeling of graphs via the system of stochastic differential equations},
+ author={Jo, Jaehyeong and Lee, Seul and Hwang, Sung Ju},
+ booktitle={International conference on machine learning},
+ pages={10362--10383},
+ year={2022},
+ organization={PMLR}
+}
+
+@article{vignac2022digress,
+ title={Digress: Discrete denoising diffusion for graph generation},
+ author={Vignac, Clement and Krawczuk, Igor and Siraudin, Antoine and Wang, Bohan and Cevher, Volkan and Frossard, Pascal},
+ journal={arXiv preprint arXiv:2209.14734},
+ year={2022}
+}
+
+@misc{wiki_Graph_isomorphism,
+ title = {Graph isomorphism},
+ journal = {Wikipedia},
+ year = {2025},
+ url = {https://en.wikipedia.org/wiki/Graph_isomorphism}
+ }
+
+ @misc{wiki_Permutation_matrix,
+ title = {Permutation matrix},
+ journal = {Wikipedia},
+ year = {2025},
+ url = {https://en.wikipedia.org/wiki/Permutation_matrix}
+ }
+
+ @misc{wiki_Graph_automorphism,
+ title = {Graph automorphism},
+ journal = {Wikipedia},
+ year = {2025},
+ url = {https://en.wikipedia.org/wiki/Graph_automorphism}
+ }
+
+ @article{yan2023swingnn,
+ title={Swingnn: Rethinking permutation invariance in diffusion models for graph generation},
+ author={Yan, Qi and Liang, Zhengyang and Song, Yang and Liao, Renjie and Wang, Lele},
+ journal={arXiv preprint arXiv:2307.01646},
+ year={2023}
+}
+
+@article{maron2019provably,
+ title={Provably powerful graph networks},
+ author={Maron, Haggai and Ben-Hamu, Heli and Serviansky, Hadar and Lipman, Yaron},
+ journal={Advances in neural information processing systems},
+ volume={32},
+ year={2019}
+}
+
+@inproceedings{liu2021swin,
+ title={Swin transformer: Hierarchical vision transformer using shifted windows},
+ author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
+ booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
+ pages={10012--10022},
+ year={2021}
+}
+
+@inproceedings{morris2019weisfeiler,
+ title={Weisfeiler and leman go neural: Higher-order graph neural networks},
+ author={Morris, Christopher and Ritzert, Martin and Fey, Matthias and Hamilton, William L and Lenssen, Jan Eric and Rattan, Gaurav and Grohe, Martin},
+ booktitle={Proceedings of the AAAI conference on artificial intelligence},
+ volume={33},
+ number={01},
+ pages={4602--4609},
+ year={2019}
+}
+
+@article{xu2024joint,
+ title={Joint generative modeling of scene graphs and images via diffusion models},
+ author={Xu, Bicheng and Yan, Qi and Liao, Renjie and Wang, Lele and Sigal, Leonid},
+ journal={arXiv preprint arXiv:2401.01130},
+ year={2024}
+}
+
+@article{liao2019efficient,
+ title={Efficient graph generation with graph recurrent attention networks},
+ author={Liao, Renjie and Li, Yujia and Song, Yang and Wang, Shenlong and Hamilton, Will and Duvenaud, David K and Urtasun, Raquel and Zemel, Richard},
+ journal={Advances in neural information processing systems},
+ volume={32},
+ year={2019}
+}
+
+@article{laabid2024equivariant,
+ title={Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models},
+ author={Laabid, Najwa and Rissanen, Severi and Heinonen, Markus and Solin, Arno and Garg, Vikas},
+ journal={arXiv preprint arXiv:2405.17656},
+ year={2024}
+}
+
+@article{abramson2024accurate,
+ title={Accurate structure prediction of biomolecular interactions with AlphaFold 3},
+ author={Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick, Joshua and others},
+ journal={Nature},
+ volume={630},
+ number={8016},
+ pages={493--500},
+ year={2024},
+ publisher={Nature Publishing Group UK London}
+}
+
+@article{nordenfors2023optimization,
+ title={Optimization dynamics of equivariant and augmented neural networks},
+ author={Nordenfors, Oskar and Ohlsson, Fredrik and Flinth, Axel},
+ journal={arXiv preprint arXiv:2303.13458},
+ year={2023}
+}
+
+@article{nordenfors2025data,
+ title={Data Augmentation and Regularization for Learning Group Equivariance},
+ author={Nordenfors, Oskar and Flinth, Axel},
+ journal={arXiv preprint arXiv:2502.06547},
+ year={2025}
+}
\ No newline at end of file
diff --git a/blog/2025/cogen-motion/IMLE_algorithm.png b/blog/2025/cogen-motion/IMLE_algorithm.png
new file mode 100644
index 0000000..0aacd84
Binary files /dev/null and b/blog/2025/cogen-motion/IMLE_algorithm.png differ
diff --git a/blog/2025/cogen-motion/imle_moflow.png b/blog/2025/cogen-motion/imle_moflow.png
new file mode 100644
index 0000000..5bc8450
Binary files /dev/null and b/blog/2025/cogen-motion/imle_moflow.png differ
diff --git a/blog/2025/cogen-motion/noise_to_traj_moflow.png b/blog/2025/cogen-motion/noise_to_traj_moflow.png
new file mode 100644
index 0000000..4f19d60
Binary files /dev/null and b/blog/2025/cogen-motion/noise_to_traj_moflow.png differ
diff --git a/blog/2025/cogen-motion/vehicle_1_trajflow.gif b/blog/2025/cogen-motion/vehicle_1_trajflow.gif
new file mode 100644
index 0000000..5050fa5
Binary files /dev/null and b/blog/2025/cogen-motion/vehicle_1_trajflow.gif differ
diff --git a/blog/2025/gen-graph/diffusesg_1.png b/blog/2025/gen-graph/diffusesg_1.png
new file mode 100644
index 0000000..62c556c
Binary files /dev/null and b/blog/2025/gen-graph/diffusesg_1.png differ
diff --git a/blog/2025/gen-graph/diffusesg_2.png b/blog/2025/gen-graph/diffusesg_2.png
new file mode 100644
index 0000000..934f4dd
Binary files /dev/null and b/blog/2025/gen-graph/diffusesg_2.png differ
diff --git a/blog/2025/gen-graph/distribution.png b/blog/2025/gen-graph/distribution.png
new file mode 100644
index 0000000..30b7947
Binary files /dev/null and b/blog/2025/gen-graph/distribution.png differ
diff --git a/blog/2025/gen-graph/graph_diffusion.png b/blog/2025/gen-graph/graph_diffusion.png
new file mode 100644
index 0000000..2552db8
Binary files /dev/null and b/blog/2025/gen-graph/graph_diffusion.png differ
diff --git a/blog/2025/gen-graph/icml_workshop_graph_1.png b/blog/2025/gen-graph/icml_workshop_graph_1.png
new file mode 100644
index 0000000..9dfc7ff
Binary files /dev/null and b/blog/2025/gen-graph/icml_workshop_graph_1.png differ
diff --git a/blog/2025/gen-graph/icml_workshop_graph_2.png b/blog/2025/gen-graph/icml_workshop_graph_2.png
new file mode 100644
index 0000000..1fd7e79
Binary files /dev/null and b/blog/2025/gen-graph/icml_workshop_graph_2.png differ
diff --git a/blog/2025/gen-graph/icml_workshop_graph_3.png b/blog/2025/gen-graph/icml_workshop_graph_3.png
new file mode 100644
index 0000000..3be43db
Binary files /dev/null and b/blog/2025/gen-graph/icml_workshop_graph_3.png differ
diff --git a/blog/2025/gen-graph/icml_workshop_graph_4.png b/blog/2025/gen-graph/icml_workshop_graph_4.png
new file mode 100644
index 0000000..1b29f11
Binary files /dev/null and b/blog/2025/gen-graph/icml_workshop_graph_4.png differ
diff --git a/blog/2025/gen-graph/icml_workshop_graph_task_5.png b/blog/2025/gen-graph/icml_workshop_graph_task_5.png
new file mode 100644
index 0000000..c904ac0
Binary files /dev/null and b/blog/2025/gen-graph/icml_workshop_graph_task_5.png differ
diff --git a/blog/2025/gen-graph/model.png b/blog/2025/gen-graph/model.png
new file mode 100644
index 0000000..4cb4a3a
Binary files /dev/null and b/blog/2025/gen-graph/model.png differ
diff --git a/blog/2025/gen-graph/molecule_jin_20_icml.png b/blog/2025/gen-graph/molecule_jin_20_icml.png
new file mode 100644
index 0000000..941efa8
Binary files /dev/null and b/blog/2025/gen-graph/molecule_jin_20_icml.png differ
diff --git a/blog/2025/gen-graph/novelty_dataset2.png b/blog/2025/gen-graph/novelty_dataset2.png
new file mode 100644
index 0000000..f36ce11
Binary files /dev/null and b/blog/2025/gen-graph/novelty_dataset2.png differ
diff --git a/blog/2025/gen-graph/prob_perm_invar.png b/blog/2025/gen-graph/prob_perm_invar.png
new file mode 100644
index 0000000..d6d959b
Binary files /dev/null and b/blog/2025/gen-graph/prob_perm_invar.png differ
diff --git a/blog/2025/gen-graph/qualitative_result.png b/blog/2025/gen-graph/qualitative_result.png
new file mode 100644
index 0000000..1cc84d2
Binary files /dev/null and b/blog/2025/gen-graph/qualitative_result.png differ
diff --git a/blog/2025/gen-graph/quantitative_result_1.png b/blog/2025/gen-graph/quantitative_result_1.png
new file mode 100644
index 0000000..3c5d0b8
Binary files /dev/null and b/blog/2025/gen-graph/quantitative_result_1.png differ
diff --git a/blog/2025/gen-graph/quantitative_result_2.png b/blog/2025/gen-graph/quantitative_result_2.png
new file mode 100644
index 0000000..071150f
Binary files /dev/null and b/blog/2025/gen-graph/quantitative_result_2.png differ
diff --git a/blog/2025/gen-graph/scene_graph_xu_17_cvpr.png b/blog/2025/gen-graph/scene_graph_xu_17_cvpr.png
new file mode 100644
index 0000000..fafad08
Binary files /dev/null and b/blog/2025/gen-graph/scene_graph_xu_17_cvpr.png differ