AI Security involves the use of artificial intelligence to enhance the security of digital systems and networks. This technology is crucial in identifying and mitigating cyber threats in real-time, providing a proactive approach to cybersecurity. AI...
The integration of AI and machine learning in satellite networks is revolutionizing the way satellite communications are managed and optimized. By leveraging machine learning algorithms, satellite networks can predict and adapt to changing...
The Intelligent Commerce Platform by Visa is a cutting-edge solution that leverages artificial intelligence to transform the way consumers engage in online shopping. This platform enables AI agents to make secure purchases using credit cards,...
Nova Sonic is a new AI voice model unveiled by Amazon, designed to enhance voice interaction capabilities. This model is part of Amazon's efforts to improve the naturalness and responsiveness of AI-driven voice assistants. Nova Sonic leverages...
Armet AI is a generative AI technology developed by Fortanix, designed to enable secure and compliant deployment of generative AI applications within enterprises. This technology focuses on mitigating data exposure risks by providing a secure...
Entropy-Reinforced Planning (ERP) is an advanced algorithmic approach designed to enhance the decoding process of Transformer models, particularly in the context of drug discovery. The primary objective of drug discovery is to identify chemical...
Quantum Machine Learning (QML) is an emerging field that combines principles of quantum computing with machine learning algorithms to enhance computational capabilities. Quantum computing leverages quantum bits, or qubits, which can exist in...
MVSAnywhere is a novel architecture designed for zero-shot multi-view stereo (MVS) depth estimation, a fundamental challenge in computer vision. This technology aims to generalize across diverse domains and depth ranges, addressing the limitations...
Refined Geometry-guided Head Avatar Reconstruction is a technology designed to create high-fidelity 3D head avatars from monocular videos. This technology is particularly useful for virtual human applications, where realistic and detailed head...
DeepSound-V1 is a framework designed for the generation of high-quality, synchronized audio from video and optional text inputs. This technology leverages multi-modal joint learning frameworks to achieve precise alignment between visual and audio...
Vision Language Models (VLMs) are a class of artificial intelligence models that integrate visual and textual data to perform tasks such as image captioning, visual question answering, and object detection. In the context of medical imaging, VLMs...
Machine Learning (ML) decision systems are a subset of artificial intelligence technologies that focus on enabling machines to make decisions based on data. These systems are designed to learn from data inputs, identify patterns, and make decisions...
To enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by...
In tasks like summarization and open-book question answering (QA), Large Language Models (LLMs) often encounter 'contextual hallucination', where they produce irrelevant or incorrect responses despite having access to accurate source information....
Microscopy is an essential tool in scientific research, enabling the visualization of structures at micro- and nanoscale resolutions. However, the field of microscopy often encounters limitations in field-of-view (FOV), restricting the amount of...
ControlNet is a recent advancement in conditional image generation using diffusion models, which has shown great potential in achieving high-quality images while adhering to user-defined constraints. This technology enables precise alignment between...
Foundation models, a class of deep learning systems, are trained by minimizing reconstruction error over a training set. This process inherently involves memorization and reproduction of training samples, which raises concerns from a copyright...
Machine Learning (ML) has become an essential tool in risk prediction modelling, particularly in the context of large-scale survival data. The UK Biobank study exemplifies the application of ML in predicting health outcomes by analyzing vast...
SimLingo is a model designed to integrate large language models (LLMs) into autonomous driving systems, aiming to improve generalization and explainability. The model addresses the challenge of achieving both high driving performance and extensive...
Unified Dense Prediction of Video Diffusion is a novel approach that integrates video generation with entity segmentation and depth map prediction from text prompts. This unified network utilizes colormap representations for entity masks and depth...
Depth Any Video is a model designed to address the challenges of video depth estimation, which has traditionally been limited by the scarcity of consistent and scalable ground truth data. The model introduces two key innovations: a scalable...
SEGO is an unsupervised framework designed to improve the reliability of graph neural networks (GNNs) by detecting out-of-distribution (OOD) samples during testing. With the increasing amount of unlabeled data, OOD detection is crucial for ensuring...
Multiple Boosting Calibration Trees (MBCT) is a feature-aware binning framework designed to improve the calibration of machine learning classifiers. Traditional classifiers focus on accuracy, but certain applications require calibrated probability...
HumanVBench is an innovative benchmark designed to evaluate the human-centric video understanding capabilities of Multimodal Large Language Models (MLLMs). Traditional benchmarks focus on object and action recognition, often neglecting the nuances...
Multiplayer Information Asymmetric Contextual Bandits is a novel framework in reinforcement learning that extends the classical single-player contextual bandit problem to a multiplayer setting. In this framework, multiple players each have their own...
Probabilistic Discoverable Extraction is a method designed to measure the memorization of training data in large language models (LLMs). Traditional discoverable extraction methods split a training example into a prefix and suffix, prompting the LLM...
The Hierarchical Neuro-Symbolic Decision Transformer is a framework that combines classical symbolic planning with transformer-based policies to tackle complex decision-making tasks. At the high level, a symbolic planner constructs a sequence of...
Mutual Information (MI) is a measure of the dependency between variables, crucial for various applications in machine learning. However, computing MI in high-dimensional spaces with intractable likelihoods is challenging. This paper presents a...
Foundation models are large-scale deep learning models that serve as a base for various downstream tasks. The training process of these models involves minimizing the reconstruction error over a training set, which can lead to the memorization and...
MERGE is a comprehensive bimodal dataset designed to advance research in Music Emotion Recognition (MER). The field of MER has evolved from audio-centric systems to bimodal ensembles that incorporate both audio and lyrics. However, the development...
The use of neural networks for control variates in lattice field theory represents a novel approach to reducing uncertainty in stochastic methods. Lattice QCD, a key area of study in theoretical physics, often faces challenges due to the inherent...
The Multimodal Transformer Neural Network is a sophisticated machine learning model designed to predict the occurrence of wildfires in real-time. This model integrates various advanced AI techniques and statistical methods to analyze large-scale...
FMEval is a comprehensive evaluation suite developed by Amazon SageMaker Clarify, designed to assess the quality and responsibility of large language models (LLMs) in generative AI applications. It provides standardized implementations of metrics to...
depyf is a tool designed to demystify the inner workings of the PyTorch compiler, introduced in PyTorch 2.x. The PyTorch compiler accelerates deep learning programs by operating at the Python bytecode level, which can be opaque to researchers. depyf...
Orthogonal Bases for Equivariant Graph Learning is a framework for learning graph-structured data using graph neural networks (GNNs). Due to the permutation-invariant requirement of graph learning tasks, invariant and equivariant linear layers are...
gsplat is an open-source library designed for training and developing Gaussian Splatting methods. It features a front-end with Python bindings compatible with the PyTorch library and a back-end with highly optimized CUDA kernels. gsplat offers...
Regularizing Hard Examples in Adversarial Training is a technique that improves the robustness of neural networks by addressing the negative impact of hard-to-learn examples. The approach involves pruning hard examples from the training set, which...
Optimal Experiment Design for Causal Effect Identification is a framework that leverages Pearl's do-calculus to identify causal effects from observational data. When causal effects are not identifiable, the framework designs a collection of...
The Bayesian Sparse Gaussian Mixture Model (BSGMM) is designed for clustering in high-dimensional data where the number of clusters can grow with the sample size. This model addresses the challenge of parameter estimation in high dimensions by...
Directed cyclic graphs are a powerful tool for causal discovery in longitudinal observational data. They allow for the simultaneous discovery of time-lagged and instantaneous causality, which is crucial in understanding complex systems where...