Landmark Papers | Gist.Science

Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC

The ATLAS collaboration at the LHC reported the observation of a new neutral boson with a mass of approximately 126 GeV, exhibiting a statistical significance of 5.9 standard deviations, which is consistent with the properties of the Standard Model Higgs boson.

The ATLAS Collaboration2012-07-31⚛️ hep-ex

Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC

The CMS experiment at the LHC observed a new boson with a mass of approximately 125 GeV in proton-proton collisions, characterized by a 5.0 standard deviation significance and decay modes consistent with the Standard Model Higgs boson.

The CMS Collaboration2012-07-31⚛️ hep-ex

Observation of Gravitational Waves from a Binary Black Hole Merger

This paper reports the first direct detection of gravitational waves on September 14, 2015, originating from the merger of a binary stellar-mass black hole system, a discovery that confirms the existence of such binaries and validates key predictions of general relativity.

The LIGO Scientific Collaboration, the Virgo Collaboration2016-02-11⚛️ gr-qc

First M87 Event Horizon Telescope Results. I. The Shadow of the Supermassive Black Hole

The Event Horizon Telescope collaboration successfully imaged the supermassive black hole at the center of galaxy M87, revealing a bright asymmetric ring surrounding a dark shadow consistent with general relativity predictions and enabling a precise mass measurement of approximately 6.5 billion solar masses.

The Event Horizon Telescope Collaboration2019-06-26🔭 astro-ph.GA

Attention Is All You Need

This paper introduces the Transformer, a novel neural network architecture that relies entirely on attention mechanisms while eliminating recurrence and convolutions, demonstrating superior translation quality, faster training times, and strong generalization to other tasks compared to existing state-of-the-art models.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin2017-06-12💬 cs.CL

Deep Residual Learning for Image Recognition

This paper introduces a residual learning framework that reformulates network layers to learn residual functions, enabling the successful training of extremely deep neural networks (up to 152 layers) that significantly outperform previous models and achieved first place in multiple 2015 computer vision competitions.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun2015-12-10👁️ cs.CV

Generative Adversarial Networks

This paper proposes a new generative modeling framework based on a minimax two-player game between a generative model that captures data distribution and a discriminative model that distinguishes real from generated samples, which can be trained efficiently using backpropagation without Markov chains.

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio2014-06-10📊 stat.ML

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

The paper introduces BERT, a novel bidirectional language representation model that leverages pre-training on unlabeled text to achieve state-of-the-art performance across a wide range of natural language processing tasks with minimal fine-tuning.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova2018-10-11💬 cs.CL

Denoising Diffusion Probabilistic Models

This paper presents high-quality image synthesis using denoising diffusion probabilistic models, achieving state-of-the-art results on CIFAR10 and LSUN datasets through a novel training objective connecting diffusion models to denoising score matching with Langevin dynamics.

Jonathan Ho, Ajay Jain, Pieter Abbeel2020-06-19🤖 cs.LG

Adam: A Method for Stochastic Optimization

This paper introduces Adam, an efficient and memory-light stochastic optimization algorithm that uses adaptive moment estimates to effectively handle large-scale, noisy, and sparse gradient problems while offering strong theoretical convergence guarantees and competitive empirical performance.

Diederik P. Kingma, Jimmy Ba2014-12-22🤖 cs.LG

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

This paper introduces Batch Normalization, a technique that accelerates deep neural network training and improves performance by normalizing layer inputs to reduce internal covariate shift, thereby allowing for higher learning rates, better initialization robustness, and reduced reliance on regularization methods like Dropout.

Sergey Ioffe, Christian Szegedy2015-02-11🤖 cs.LG

Auto-Encoding Variational Bayes

This paper introduces the Auto-Encoding Variational Bayes (AEVB) framework, which enables efficient stochastic variational inference and learning in directed probabilistic models with continuous latent variables by utilizing a reparameterized lower bound estimator and an approximate inference model to handle intractable posteriors and scale to large datasets.

Diederik P Kingma, Max Welling2013-12-20📊 stat.ML

Scaling Laws for Neural Language Models

This paper establishes that language model performance follows predictable power-law scaling relationships with model size, dataset size, and compute, revealing that optimal training efficiency is achieved by prioritizing very large models trained on modest data amounts rather than training smaller models to convergence.

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei2020-01-23🤖 cs.LG

Improving neural networks by preventing co-adaptation of feature detectors

This paper introduces the "dropout" technique, which randomly omits feature detectors during training to prevent complex co-adaptations and overfitting, thereby significantly improving neural network performance on tasks like speech and object recognition.

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov2012-07-03💻 cs.NE

Efficient Estimation of Word Representations in Vector Space

This paper introduces two novel, computationally efficient model architectures for learning high-quality continuous word vector representations from massive datasets, which achieve state-of-the-art performance in measuring syntactic and semantic word similarities at a fraction of the previous computational cost.

Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean2013-01-16💬 cs.CL