Monday|April 27, 2026
Vision AI Alliances Manager Nvidia
Eyal is a Vision AI Alliances Manager at NVIDIA, Specializing in computer vision and deep learning, he holds an B.Sc. in Electrical Engineering from the Technion, Focused in Video analytics for the past 17 years, Working closely with dozens of Vision AI companies in Israel, both tech and business development.
Build Vision AI Agents With NVIDIA Cosmos Reason VLM and Video Search and Summarization Blueprint
AI systems often struggle to connect perception with reasoning in dynamic real-world environments. NVIDIA’s Cosmos Reason VLM bridges this gap by combining vision, language, and world knowledge to power intelligent video and multimodal understanding. Join this session to learn how to post-train Cosmos Reason Vision Language Model with your own data and build Vision AI agents using NVIDIA NIM microservices and the VSS blueprint. The session will feature real-world use cases and practical guidance on creating intelligent workflows for applications across manufacturing, logistics, safety, and more.
ProfessorSheba Medical Center & Ariel University
Aidoc
ProfessorTel Aviv University
Sr. Research ManagerIBM Research
Granite Vision – the VLM for Enterprise Workflows
Granite Vision is IBM’s open‑source vision‑language model, engineered to meet the demands of enterprise‑scale workflows. Our latest advancements significantly enhance the model’s ability to understand complex visual structures – such as tables, charts, and forms – and to perform high‑fidelity semantic and structured information extraction from real‑world business documents. In this session, we will highlight the technologies behind these capabilities, share key insights from our research, and demonstrate how they enable more intelligent RAG pipelines and agentic workflows, ultimately accelerating critical enterprise processes.
Associate ProfessorBen-Gurion University of the Negev
Deep-learning for Geometric Alignment
Senior ResearcherAutobrains & Tel-Aviv University
Leah Bar received her B.Sc. in Physics, M.Sc. in Biomedical Engineering, and Ph.D. in Electrical Engineering from Tel Aviv University. She then completed her postdoctoral fellowship at the University of Minnesota. She is currently a Senior Researcher at Autobrains and in the Applied Mathematics Department at Tel Aviv University. Her work lies at the intersection of machine and deep learning, image processing, computer vision, and inverse problems, with particular interest in bridging mathematical structure and data-driven methods.
A Geometric-Probabilistic View of Diffusion and Manifold Projection for Image Restoration
Natural images are often viewed as concentrating near low-dimensional structures embedded in high-dimensional spaces. Yet in many modern approaches this geometry remains implicit, absorbed into large probabilistic models. In this talk, I revisit Blind Image Denoising from a geometric-probabilistic perspective. We couple an encoder-decoder representation with a learned distance function, interpreting restoration as iterative projection toward the set of clean and meaningful images. This viewpoint clarifies diffusion-type dynamics and leads naturally to a deterministic alternative, the Manifold-Probabilistic Projection Model (MPPM), applicable in both pixel and latent spaces and exhibiting robust behavior across diverse degradations.
ProfessorTechnion - Israel Institute of Technology
Tomer Michaeli is a Professor in the Electrical and Computer Engineering Department at Technion – Israel Institute of Technology. He completed his BSc and PhD degrees in that department in 2005 and 2012, respectively. After a postdoctoral period in the Weizmann Institute of Science, he joined Technion as a faculty member in 2015. His research lies in the fields of Computer Vision and Machine Learning. He is the recipient of several awards, among which are the Krill Prize for Excellence in Scientific Research by the Wolf foundation (2020), Best Paper Award (Marr Prize) at ICCV 2019, Best Paper Award at SIGGRAPH 2025, and Best Student Paper Award at ICCV 2025.
Editing real images with pre-trained flow models
Flow models can generate images and videos based on textual descriptions. However, in many cases, it is desirable to edit real images/videos rather than generating synthetic ones. Repurposing a pre-trained flow model for performing editing has attracted significant attention in recent years. Yet, the solutions tended to struggle either with maintaining similarity to the source image/video or with adhering to the text describing the desired edit. This talk will show that the root cause for those limitations is the reliance on an “editing-by-inversion” paradigm, where the source image/video is first mapped to the initial noise that generates it. It will be demonstrated how breaking away from this paradigm allows achieving significantly better results. The proposed inversion-free, training free, and model-agnostic approaches have seen widespread adoption and integration into popular models, achieving results that compete even with training-based methods.
Chief Technology OfficerMobileye
Shai Shalev-Shwartz is the chief technology officer of Mobileye. He leads software and algorithm technology development for Mobileye’s advanced driving assist systems (ADAS), highly autonomous and fully autonomous driving solutions, and enabling technologies including Responsibility-Sensitive Safety (RSS) and Road Experience Management™ (REM™) maps. He is also a professor at the Rachel and Selim Benin School of Computer Science and Engineering at the Hebrew University of Jerusalem.
Shalev-Shwartz developed the science behind RSS as well as an economically viable path toward a future where there are no casualties from car accidents. He is drawn to autonomous driving as the first large scale deployment of artificial intelligence and machine learning outside of the cybernetic world.
Shalev-Shwartz is best known for pioneering research in machine learning and was listed as one of the 100 most influential researchers worldwide in 2016 by AMiner. In 2014, he co-authored one of the top books used by major universities on theoretical machine learning: “Understanding Machine Learning From Theory to Algorithms.”
Before joining Hebrew University and Mobileye, Shalev-Shwartz was a research assistant professor at Toyota Technological Institute in Chicago, and also worked in research at both Google and IBM. Shalev-Shwartz has written more than 100 research papers, focusing on machine learning, online prediction, optimization techniques, and practical algorithms.
In 2020, he was awarded the prestigious Michael Bruno Award for his groundbreaking research and his unique contribution to computer science and engineering.
Scaling Autonomous Driving: Explicit State, Synthetic Societies, and Foundation-Model Supervision
Autonomous driving sits at the intersection of two worlds: the geometric precision demanded by safety-critical robotics, and the open-ended semantic diversity captured by modern foundation models. To reach human-level reliability, self-driving systems must master both.
This keynote introduces a new architectural perspective that combines explicit online sensing state, semantic reasoning, and large-scale closed-loop training in artificially simulated “driving societies.” I will show how foundation models can be used not inside the high-frequency control loop, but around it: as automatic labelers, long-tail generators, anomaly detectors, and slow-thinking semantic supervisors.
By merging real fleet data with semantic simulation, by learning intentions and interactions rather than only trajectories, and by enforcing principled safety interfaces on top of learned policies, we obtain a system that scales like modern AI while satisfying the constraints of safety-critical engineering.
This architecture highlights a roadmap for the next generation of SDS: structured perception is not a limitation for end-to-end learning, but a force multiplier that enables data efficiency, reasoning, and safety in the long tail of autonomous driving.
Assistant professor Bar Ilan University
COPER: Correlation-based Permutations for Multi-View Clustering
Combining data from multiple sources often leads to better insights, yet many existing multi-view clustering methods are tailored to specific domains or require complex, multi-stage pipelines. We present a practical end-to-end deep learning framework that works across diverse data types, including images and tabular data. Our approach learns unified representations that capture shared structure across sources and enables consistent grouping without manual labels. The method is scalable, robust to noise, and supported by both theoretical insights and extensive experiments on ten benchmark datasets, demonstrating strong and reliable performance across varied real-world settings.
AI research scientistEarth Dynamics AI
Michal Holtzman Gazit is a Computer Vision and AI Researcher at Earth Dynamics AI with over 25 years of expertise in image processing, computer vision and deep learning. Her career evolved from a foundation in medical imaging to sophisticated 2D and 3D structural analysis. She holds a BSc and MSc in electrical engineering, a PhD in computer science Technion and performed post-doctoral research in inverse problems at the University of British Columbia. Michal specializes in leading the transition of advanced research to production-ready systems. Currently, she develops Geoscience Foundation Models for mineral exploration, utilizing AI to decode Earth’s 3D structures and revolutionize resource discovery
Multi-Modal Geologic Intelligence: 3D Inversion and Map Synthesis via Generative Foundation Models
The integration of generative foundation models into geoscientific workflows represents a transformative shift in solving complex inverse problems. We explore advanced architectures for map synthesis via Conditional Flow Matching and volumetric inversion via 3D VAEs, leveraging magnetic, gravity, and drilling data. By constraining multi-modal generative priors with physical laws, we synthesize high-fidelity geologic insights from sparse, unorganized measurements. This approach accelerates mineral exploration by significantly reducing the cost and uncertainty of targeting subsurface anomalies. This synergy of cross-modal generative processes and potential field theory defines a new era of geologic intelligence.
Principal Image Processing SpecialistAbbott Laboratories
Lorina Dascal is a principal computer vision and image processing specialist at Abbott Labs. Her research interests include deep learning for image/ video understanding, 3D medical shapes, multimodal fusion of imaging and neural partial differential equation in vision. She has authored 14 published papers and has earned 11 patents. She holds a PhD in Applied Mathematics from Tel-Aviv University, she was a postdoctoral fellow and a research assistant in the Computer Science Department at the Technion.
Automatic 3D Surface Reconstruction of the Left Atrium from Unorganized Contours
ICE (intracardiac echocardiography) is a valuable tool in cardiac catheterization and electrophysiology (EP) procedures, assisting physicians in visualizing anatomical details and in monitoring procedures like catheter ablation, septal defect closure, left atrial appendage occlusion, and valve implantation. Our aim is to automatically create an accurate three-dimensional surface model of Left atrium from automatic segmented boundaries of ICE images. We propose a modified Poisson reconstruction method with additional geometric constraints, which enables the creation of accurate, highly detailed and computationally efficient surfaces from diverse sets of unorganized and sparse contours.
Tel Aviv University
Hana Bezalel holds an M.Sc. from Tel Aviv University, supervised by Hadar Averbuch Elor. Her CVPR 2025 publication focuses on relative, in-the-wild, pose estimation in extreme settings. Previously Lead Computer Vision Engineer at Rafael, she currently serves as an Algorithm Developer at Mobileye, where her work centers on geometric computer vision and spatial understanding.
Extreme Rotations Estimation In The Wild
We present a technique and benchmark dataset for estimating the relative 3D orientation between a pair of Internet images captured in an extreme setting, where the images have limited or non-overlapping field of views. Prior work targeting extreme rotation estimation assume constrained 3D environments and emulate perspective images by cropping regions from panoramic views. However, real images captured in the wild are highly diverse, exhibiting variation in both appearance and camera intrinsics. In this work, we propose a Transformer-based method for estimating relative rotations in extreme real-world settings, and contribute the ExtremeLandmarkPairs dataset, assembled from scene-level Internet photo collections. Our evaluation demonstrates that our approach succeeds in estimating the relative rotations in a wide variety of extreme-view Internet image pairs, outperforming various baselines, including dedicated rotation estimation techniques and contemporary 3D reconstruction methods.
Senior AI Applied Researcher Bluewhite Robotics
Or Kozlovsky is a Senior AI Applied Researcher at Bluewhite Robotics and was recently a Student Researcher at Google. His work and research focus on generative AI, spatial AI, and real-time computer vision in both 2D and 3D domains. With a strong record of bridging cutting-edge research with real-world computer vision applications across a broad range of areas, including medical, space, entertainment, and robotics.
Currently, Or is an M.Sc. student at Tel Aviv University under the supervision of Prof. Amit Bermano, and holds dual B.Sc. degrees in Electrical Engineering and Economics from the Technion.
BINA: Bootstrapped Intelligence for Novel Adaptation
Robotic systems in real-world environments face conditions unseen during development, and while foundation models promise better generalization, integrating them under real-time onboard constraints remains challenging. We introduce BINA , a deployment-driven framework for online perceptual adaptation. BINA leverages online sparse supervision from a VLM to incrementally distil semantic knowledge into an onboard perception module. Beyond single-robot learning, BINA supports fleet-level knowledge aggregation, enabling scalable adaptation to new environments. Demonstrated on off-road traversability estimation, BINA rapidly converges from zero prior knowledge through operator-guided driving. Although demonstrated on traversability, BINA is task-agnostic and applicable to other perception and autonomy tasks.
AI ResearcherThe Hebrew University of Jerusalem
Latent Space JAM: Layout-Guided Video Generation
R&DLightricks
Hold MS.c in computer science for HUJI under Prof. Shmuel Peleg supervision.Currently works at Lightricks; main research interest is Video Generation.
Sr. Algorithm DeveloperAlign Technology, Inc.
Zvi Stein is an Algorithms Engineer at Align Technology, working on computer vision and 3D geometry pipelines for multi-view scanning. His work focuses on surface reconstruction, mesh refinement, and performance-critical implementations with GPU acceleration. He has experience building end-to-end systems, from image-based inference to real-time processing and quality evaluation, aimed at improving surface accuracy and robustness in challenging acquisition conditions.
Mesh Refinement from Multi-View RGB Using Image-Predicted Surface Normals
Accurate surface refinement in regions with fine geometric detail remains challenging in practical 3D acquisition pipelines, where reconstructed meshes are often limited by scan resolution and noise. Although many scanning systems capture high-resolution multi-view RGB imagery, exploiting these images for metric geometry refinement is difficult due to scale ambiguity and perspective effects inherent to wide field-of-view 2D projections.
We present a geometry-refinement pipeline that converts multi-view RGB observations into a consistent surface normal field and integrates it to deform an initial mesh toward a refined surface. The central approach is to use image-predicted surface normals as the primary refinement signal, providing scale-consistent geometric constraints that are not directly available from intensity values alone. Input views are selected and scored based on geometric visibility and viewpoint diversity to ensure robust coverage and stable convergence across the surface. To mitigate projection-induced distortions, images are undistorted and re-parameterized into locally aligned patches, with corresponding rotations applied to the predicted normals.
A U-Net model trained from scratch predicts normal maps on a dedicated network surface, while deformation is applied on a separate, explicitly upsampled sampling surface designed to absorb high-frequency detail beyond the resolution of the original reconstruction; an additional simplified surface supports efficient view selection and scoring. The refined normal field is fused by solving a Poisson formulation to recover metrically consistent vertex displacements. Experimental results demonstrate improved reconstruction fidelity in high-curvature and detail-critical regions, recovering subtle structures that are commonly smoothed or missing in scan-resolution-limited meshes.
Software Tech Lead, Senior AI EngineerGeneral Motors
Introduction to Multi-Agent Architecture Patterns
Senior Algorithm EngineerWSC Sports
Unified Sports Perception: Single-Pass Extraction of Dense Player Metadata via Lightweight VLMs
Algorithm team leadWSC Sports
Assistant Professor (Senior Lecturer)The Hebrew University of Jerusalem
Retrieval-Augmented 3D Vision: Analysis and Generation in the Long Tail
Algorithm Engineer Corephotonics a Samsung Company
Abraham (Avi) Pelz received his B.Sc. and M.Sc. degrees in Electrical and Electronics Engineering from Tel Aviv University. Since 2018, he has been an Algorithm Researcher at Corephotonics, specializing in vision algorithm proof-of-concept. His research encompasses challenging data scenarios, including self-supervision, domain adaptation, neural uncertainty estimation, and data scaling. Avi is the corresponding author of: Kim, Jaeseong, Abraham Pelz, Michael Scherer and David Mendlovic. “On the Effectiveness of Sparse Linear Polarization Pixels for Face Anti-Spoofing.” IEEE Sensors Journal 25 (2025)
Data Efficiency Estimation from Tiny Datasets: Sparse Polarization Biometric Case Study
Developing robust AI systems requires massive datasets, a significant hurdle for emerging technologies. This talk explores how to evaluate data utility early in the R&D cycle, drawing from our recent paper, "On the Effectiveness of Sparse Linear Polarization Pixels for Face Anti-Spoofing." We not only show that sparse linear polarization is highly effective for face anti-spoofing (tenfold error reduction relative to RGB), but also demonstrate that less informative representations require exponentially more training data to reach given specifications—a trend predictable using tiny datasets. This talk provides a practical case study for comparing physical representations early in research, helping teams identify the most promising technologies before hitting the "data bottleneck."
Computer Vision ResearcherTel Aviv University
Leveraging Diffusion Models towards PE Early Diagnosis using CXRs
Patients with respiratory issues in emergency rooms typically undergo chest X-rays (CXR), which are accessible and low-cost but provide limited low-resolution imaging. Higher-risk patients are referred for more detailed and expensive CT or Computed Tomography Pulmonary Angiography (CTPA) scans, which involve higher radiation. The study focuses on detecting Pulmonary Embolism (PE), usually invisible in CXR but detectable in CTPA.
By leveraging paired CXR–CTPA data, we investigate two complementary diffusion-based strategies that transfer diagnostic knowledge from the high-fidelity CTPA modality to the widely available CXR domain. In the first, a conditional diffusion model is trained to generate 3D CTPA - like representations directly from 2D CXRs, enriching the initial imaging with high-resolution vascular cues and improving PE detection performance from 69% to 80% AUC. In addition, we introduce a latent-space diffusion prior that performs cross-modal knowledge distillation, generating CTPA-informed classifier embeddings from CXR embeddings without explicit image synthesis, enabling state-of-the-art PE classification using CXR alone. Together, these approaches demonstrate that diffusion models can act as powerful cross-modal bridges, either through image generation or embedding level supervision, substantially enhancing early PE diagnosis from CXRs while reducing reliance on expensive and high radiation imaging. Although not a replacement for clinical CTPA, this framework highlights a scalable and generalizable pathway for augmenting low-cost imaging with high-level diagnostic insight. Our contributions through these works are as follows: (1) First true CXR→CTPA diffusion pipeline with diagnostic validation; (2) A novel 1D-diffusion prior for CXR→CTPA embedding distillation; (3) State-of-the-art CXR-based PE classification; (4) Modality-agnostic framework extendable to other cross-modal imaging tasks, facilitating wider access to advanced diagnostic tools.
Algorithm DeveloperApplied Materials
Warp and Render: A Dual-Network Framework for Geometry-Controlled Simulation in Semiconductor Process Diagnostics
Data Scientist Microsoft
Tom Hirshberg is a data scientist at Microsoft in the Edge AI group, where she develops multimodal AI systems for large-scale video understanding. Previously, she was a research intern at Microsoft in Redmond, focusing on optimization and control methods for autonomous robotic systems.
Tom holds a BSc and an MSc (cum laude) in Computer Science from The Technion. During her studies, she was part of the algorithm team that developed the first student’s autonomous formula race car at The Technion. Her master’s thesis explored acoustic-based indoor localization for drones, bridging signal processing, machine learning, and robotics.
Object Detection and Tracking in Live Streams Using Textual and Visual Detailed Descriptions
In the live video analysis domain, everything must happen quickly, efficiently and accurately. While traditional object detection systems rely on predefined classes, modern applications require flexibility to describe, detect, and track any object in live video streams. This brings algorithmic and computational challenges, especially for edge devices, like handling detailed attributes (e.g., “a red vintage car”), integrating specialized trackers, and managing high camera loads efficiently.
This lecture presents an algorithm for detecting and tracking objects in live video streams using detailed textual description, image examples or both. Our approach is already successfully implemented in Microsoft’s Azure AI Video Indexer.
Research Manager IBM Research
Adaptive Resolution Processing in Vision-Language Models
AI researcher The Hebrew University of Jerusalem
I am a PhD candidate at The Hebrew University of Jerusalem, advised by Prof. Dani Lischinski, and a Senior Researcher at General Motors. My research focuses on image and video generation and manipulation, with a particular interest in adverse viewing conditions and out-of-distribution (OOD) concepts, primarily related to automotive scenarios.
Seed-to-Seed: Unpaired Image Translation in Diffusion Seed Space
We introduce Seed-to-Seed Translation (StS), a framework optimizing unpaired image-to-image translation through two primary contributions. First, we provide an in-depth analysis of the space of inverted latents ("seeds"), denoted "seed-space", demonstrating that it encodes critical semantic features for discriminative tasks. Second, we leverage these features through a novel hybrid mechanism that combines a GAN with a diffusion-model to perform unpaired seed-to-seed translation ( = image translation in the seed space) before the diffusion sampling steps start.
We show that our method outperforms existing GAN and diffusion-based baselines in complex automotive scene synthesis, while establishing a novel paradigm for latent-based image manipulation.
ResearcherVolcani Institute
Dr. Iftach Klapp, (Ph.D., Electrical Engineering). Following a short Post-Doc training, he joined the Volcani Institute, where he founded the Agro‑Optics and Sensing Laboratory, dedicated to developing advanced electro-optical sensing systems for agriculture. The lab studies interactions between sensors, objects, and the environment, creating optical systems and embedding physical models into data processing to ensure accurate sensing in dynamic conditions. Its approach integrates physical modeling with inverse‑problem methods, including Physically Aware Convolutional Neural Networks, to extract reliable, meaningful information from sensor data. Prior to his PhD studies, he worked for six years in the Automatic Optical Inspection industry as an opto-mechanical R&D engineer.
Affordable Thermal Imaging: Overcoming Accuracy and Resolution Limits with AI
Plant temperature serves as a critical indicator of crop health, especially for identifying water stress that triggers stomatal closure and canopy heating. Although radiometric thermal IR cameras can detect such stress at an early stage, their high cost (>$20,000) restricts their use in agriculture. More affordable uncooled thermal cameras (~$4,000) present a promising alternative but suffer from drift, non-uniformity, limited accuracy (±5 °C), and low spatial resolution. To overcome these limitations, we developed deep-learning methods for non-uniformity correction (NUC) and super-resolution (SR), enhancing image resolution by factors of ×2 and ×4. In field experiments using a low-cost FLIR TAU2 alongside a scientific-grade FLIR A655Sc mounted on the same drone, our end-to-end system achieved real-time processing (<1 s per frame) with high fidelity, reducing root mean square error to ~0.5 °C. The derived Crop Water Stress Index (CWSI) closely matched reference measurements, with deviations of only ~1.4–1.9%, demonstrating that this approach enables precise, affordable, and scalable agricultural monitoring for water management.
Senior Data ScientistNVIDIA
Advancing AI in Radiology Research with NVIDIA Clara Open Medical Imaging Models
Advancing AI in Radiology Research with NVIDIA Clara Open Medical Imaging Models - This session presents the Clara Medical Open Models, a suite of pre-trained deep learning models designed to advance research in medical image analysis. We will present the architectural principles, dataset curation strategies, and benchmarking protocols that underpin these models, emphasizing explainability, reproducibility, and domain generalization. Case studies will illustrate their application across diverse imaging modalities, including CT, MRI, and digital pathology. Through this exploration, the session will highlight how open, standardized model repositories accelerate scientific discovery and enable robust evaluation frameworks in healthcare research.
Ben-Gurion University of the Negev
Omri Hirsch is an MSc. student in Computer Science at Ben-Gurion University of the Negev, conducting research in Computer Vision and Machine Learning in the Vision, Inference, and Learning (VIL) group under the supervision of Prof. Oren Freifeld. His research focuses on efficient geometric learning and joint image alignment, and he is the first author of FastJAM, recently accepted to NeurIPS 2025. He has previously worked on medical imaging in collaboration with Dr. Yonatan Winetraub’s lab at Stanford University. Omri is a recipient of competitive scholarships for outstanding MSc. students in AI and Data Science for two consecutive years.
FastJAM: a Fast Joint Alignment Model for Images
Joint Alignment (JA) aims to align a collection of images into a shared coordinate frame such that semantically corresponding features coincide spatially. Despite its importance in many vision applications, existing JA methods often rely on heavy optimization, large-capacity models, and extensive hyperparameters, leading to long training and limited scalability. In this talk, we present FastJAM, a fast joint alignment framework that reframes JA as a graph-based problem over sparse keypoints. FastJAM leverages pairwise correspondences and a graph neural network to efficiently predict per-image transformations, achieving state-of-the-art alignment quality while reducing runtime from minutes or hours to just seconds.
PhD CandidateBar-Ilan University
Natalya Segal is a data scientist and biomedical AI researcher with experience leading data science teams and developing AI/ML systems across multiple domains. She is an inventor on multiple granted U.S. patents recognized and adopted by leading technology companies. As a PhD candidate at Bar-Ilan University, she is pioneering a contactless, affordable brain-computer interface (BCI) that uses remote optical sensing and deep learning to decode internal speech. Her work advances optical neural decoding and cortical monitoring. She holds an MSc in Electrical Engineering and a BSc in Mathematics and Computer Science.
Adapting Long-Video Masked Autoencoders to High-Speed Brain Imaging
Self-supervised video foundation models have recently shown strong transferability across natural video tasks, yet their applicability to domains with radically different spatiotemporal statistics remains largely unexplored. We investigate whether long-video masked autoencoders (LV-MAE), originally designed for low-frame-rate semantic natural videos, can be adapted to high-speed coherent imaging that lacks semantic structure. We apply LV-MAE to speckle-pattern video2-3 recordings captured at 1000 fps from the scalp overlying language-related cortex during silent speech tasks. Despite extreme differences in resolution, modality, and temporal scale, LV-MAE learns transferable representations that enable accurate downstream classification with minimal labeled data. Using leave-one-subject-out evaluation with one-minute subject-specific calibration, the proposed approach achieves strong cross-subject performance on millisecond-scale inputs. These results suggest that masked video representation learning can generalize beyond natural video, enabling efficient learning in specialized high-speed imaging domains.
ResearcherBen Gurion University of the Negev
Shachar Shmueli is an Electrical Engineering Master’s student at Ben-Gurion University, specializing in the security of generative AI. His research focuses on developing robust attacks and defenses for diffusion models. Alongside his studies, he works as a Data Engineer at the startup Octup.
Black-box Adversarial Attack on Stable Diffusion Models
We present a black-box adversarial attack on diffusion models using genetic algorithms to evolve adversarial prompts. Our method injects evolved code strings to input prompts, modifying generated images to match target semantic content. Operating in a black-box setting with only image outputs, we use CLIP embeddings to measure semantic similarity. Evaluated on Stable Diffusion across multiple categories, our attack successfully manipulates image generation while maintaining perceptual quality. Multi-classifier evaluation demonstrates significant classification changes with minimal degradation, revealing vulnerabilities in diffusion model robustness.
PhD StudentTechnion - Israel Institute of Technology
Meir Yossef Levi (Yossi Levi) is in the final stages of his Ph.D. at the Technion, advised by Prof. Guy Gilboa, after receiving both his B.Sc. and M.Sc. in Electrical Engineering from the Technion. His research focuses on multimodal representation learning, with a particular interest in understanding the latent geometry of vision-language models and its implications. His recent work centers on the representation of foundation models, with several papers accepted to ICML and ICLR on this topic. Prior to this, he studied robust classification in 3D vision, with publications at ICCV and 3DV.
The Geometry and Likelihood Structure of CLIP Embeddings
The talk mainly covers two papers to be presented at ICML 2025. I will present our recent work analyzing the geometry of CLIP’s latent space from both geometric and probabilistic perspectives. We show that the embedding space is better characterized by two distinct, shifted ellipsoids, rather than a shared hypersphere. This finding challenges common assumptions about CLIP’s latent structure. Building on this double-ellipsoid perspective, we introduce a new measure called conformity, which captures how closely a sample aligns with its Modality Mean. Finally, I will introduce Whitened CLIP (W-CLIP) — a simple, linear transformation of the latent space into an isotropic space. It enables the use of embedding norms as a surrogate for likelihood approximation. This approach supports a wide range of applications, including domain shift detection and the identification of generative artifacts.
Senior Research ScientistOriginAI
Dvir Samuel is a researcher at OriginAI. He holds a PhD from Bar-Ilan University, and his research focuses on long-tail and few-shot learning, as well as generative models, with a particular emphasis on diffusion- and flow-matching–based methods for image and video editing and personalization. At OriginAI, he develops scalable and practical methods to advance generative AI across images, videos, and 3D content.
In Omnimatte, one aims to decompose a given video into semantically meaningful layers, including the background and individual objects along with their associated effects, such as shadows and reflections. Existing methods often require extensive training or costly self-supervised optimization. In this paper, we present OmnimatteZero, a training-free approach that leverages off-the-shelf pre-trained video diffusion models for omnimatte. It can remove objects from videos, extract individual object layers along with their effects, and composite those objects onto new videos. These are accomplished by adapting zero-shot image inpainting techniques for video object removal, a task they fail to handle effectively out-of-the-box. To overcome this, we introduce temporal and spatial attention guidance modules that steer the diffusion process for accurate object removal and temporally consistent background reconstruction. We further show that self-attention maps capture information about the object and its footprints and use them to inpaint the object's effects, leaving a clean background. Additionally, through simple latent arithmetic, object layers can be isolated and recombined seamlessly with new video layers to produce new videos. Evaluations show that OmnimatteZero not only achieves superior performance in terms of background reconstruction but also sets a new record for the fastest Omnimatte approach, achieving real-time performance with minimal frame runtime.
Co-Founder & CEO Data Compass AI
Dori Gaton works at the intersection of AI R&D and delivery: consulting, leading projects, and shipping AI systems used in real workflows. He co-founded Data Compass AI seven years ago and leads it as CEO, supporting teams with full-scale execution or targeted advisory. He enjoys tackling the hard parts - domain shift, noisy labels, and ambiguous ground truth - across computer vision, multimodal models, and trustworthy AI. At Proofig AI, where he serves as Chief AI Officer (CAIO), he has worked with the team for the past five years on production systems that help safeguard research integrity.
The Scientific Image Arms Race: How We Detect AI Generated Figures in the Wild
Generative AI is lowering the barrier to producing plausible scientific figures, creating new risks for research integrity. We present a production-oriented approach to detecting AI-generated images in academic papers, focusing on microscopy - a domain with biological and imaging “rules,” domain-specific textures, and noise. In an internal survey domain experts found that distinguishing real vs. generated imagery is very challenging, even side-by-side. We describe a deep learning classifier trained on real literature images and synthetic images generated at scale via image-to-image and image-to-text-to-image pipelines, with publication-like compression and figure distortions. We close with lessons on the generator-detector arms race and limited explainability.
Senior Applied Researcher Wix
Irit Chelly is a PhD graduate from the Computer Science Department at Ben-Gurion University, where she also earned her M.Sc., under the supervision of Prof. Oren Freifeld and Dr. Ari Pakman in the Vision, Inference, and Learning group. Her research focuses on probabilistic clustering using non-parametric Bayesian models and unsupervised learning. Her previous projects involved spatial transformations, dimensionality reduction in video analysis, and generative models. Irit won the national-level Aloni PhD Scholarship from Israel’s Ministry of Technology and Science, as well as the BGU Hi-Tech Scholarship for outstanding PhD students, and received annual awards and instructor rank for excellence in teaching core Computer Science courses.
Consistent Amortized Clustering via Generative Flow Networks
Neural models for amortized probabilistic clustering yield samples of cluster labels given a set-structured input, while avoiding lengthy Markov chain runs and the need for explicit data likelihoods. Existing methods which label each data point sequentially, like the Neural Clustering Process, often lead to cluster assignments highly dependent on the data order. Alternatively, methods that sequentially create full clusters, do not provide assignment probabilities. In this paper, we introduce GFNCP, a novel framework for amortized clustering. GFNCP is formulated as a Generative Flow Network with a shared energy-based parametrization of policy and reward. We show that the flow matching conditions are equivalent to consistency of the clustering posterior under marginalization, which in turn implies order invariance. GFNCP also outperforms existing methods in clustering performance on both synthetic and real-world data.. The talk is based on [Chelly et. all, AISTATS '25].
Senior VP, Data ScienceZefr
Or Levi is an AI Researcher and Senior VP of Data Science at Zefr. He holds a M.Sc. (Magna Cum Laude) in Information Retrieval from the Technion, the Israel Institute of Technology. Or’s strongest passion is using AI for social impact, which led him to develop innovative AI to fight the spread of misinformation online. His work has been presented in leading AI conferences and covered by international media.
When AI Agents Should Ask for Help - Building Reliable Human–AI Systems
Large Language Models (LLMs) and AI Agents are increasingly deployed in high-stakes human–AI systems such as video content moderation. Yet a fundamental limitation remains: they tend to respond with confidence even when they are wrong, creating significant real-world risks.
The central challenge in deploying LLMs and Agents is not maximizing autonomy, but enabling systems to recognize when an Agent should not be trusted. To address this, we introduce a trust-aware framework for human–AI collaboration in which a judge model predicts whether the LLM output should be trusted or escalated to a human.
Our approach relies on LLM Performance Predictors (LPPs) derived directly from LLM outputs, capturing confidence signals, self-reported uncertainty, and indicators of missing evidence or ambiguous decision rules. Evaluated on a large-scale multimodal moderation benchmark, our method improves performance while reducing unnecessary human intervention. These results suggest that reliable AI systems are built not by replacing humans, but by enabling models to know when to ask for human judgment.
Data ScientistZefr
Or Bachar is a Data Scientist at Zefr and an M.Sc. student in Machine Learning and Data Science at Reichman University, focusing on reliable machine learning and computer vision systems, with an emphasis on model uncertainty and human–AI collaboration.
VP of ResearchLightricks
Building Production Video Applications with LTX-2 and IC-LoRA
Senior Algorithms ResearcherTrigo
Efficient Multi-Camera Multi-Person Tracking in Sparse CCTV Layouts for Loss Prevention in Retail Spaces
Senior Machine Learning EngineerNexar
Roni Goldshmidt is a Senior AI researcher building real-time video foundation models for autonomous driving and safety-critical systems. Leading BADAS at Nexar: world model collision prediction from real-world dashcam data. Working on video world models, video-to-video generation, and VLMs. Publishing research and contributing to open source in explainable AI for safety-critical applications.
BADAS: Context-Aware Ego-Centric Collision Prediction Using Real-World Dashcam Data
Research Scientist IBM Research
Roi is a Research Scientist at IBM Research, Vision & Learning Technologies Group, where he focuses on multimodal embeddings, retrieval-augmented generation (RAG), vision-language models (VLMs), and LLMs. With experience spanning classical computer vision to modern deep learning, he brings a broad perspective to AI research. He holds an M.Sc. and B.Sc. in Electrical Engineering, both from the Technion – Israel Institute of Technology.
Real-World Multi-Modal RAG: Innovative Benchmarking and Efficient Visual Document Retrieval
Enterprise documents carry critical information in their visual layout, not just their text. As RAG systems evolve to handle these multi-modal documents, new challenges emerge around evaluation, retrieval quality, and production-scale efficiency. In this talk, I will present our team's recent work across three directions. First, building realistic benchmarks for multi-modal RAG that reflect real enterprise needs. Second, training vision-language based document retrievers that capture layout and visual semantics beyond text extraction. Third, our findings on redundancy in multi-vector document representations and how this insight enables significantly more efficient retrieval at query time.
Head of Computer Vision Skana Robotics
Ori holds a BSc in Electrical Engineering from Ben-Gurion University of the Negev (Israel) and an MSc in Marine Technologies from the Hatter Department of Marine Technologies at the University of Haifa (Israel), graduating on the Dean’s Honor List. During his MSc, he published two papers, including a NeurIPS 2025 paper co-authored with Prof. Tali Treibitz and Dr. Dan Rosenbaum, in which he addressed refraction-induced water-surface distortions using unsupervised, physics-constrained deep learning. In industry, Ori reduced training-data requirements for a semantic-segmentation model serving thousands of client API calls and owned deep-learning pipelines end to end—from research to deployment. He now heads Computer Vision at Skana Robotics, developing robust, real-world perception systems for marine environments.
Looking Into the Water by Unsupervised Learning of the Surface Shape
We address the problem of looking into the water from the air, where we seek to remove image distortions caused by refractions at the water surface. Our approach is based on modeling the different water surface structures at various points in time, assuming the underlying image is constant. To this end, we propose a model that consists of two neural-field networks. The first network predicts the height of the water surface at each spatial position and time, and the second network predicts the image color at each position. Using both networks, we reconstruct the observed sequence of images and can therefore use unsupervised training. We show that using implicit neural representations with periodic activation functions (SIREN) leads to effective modeling of the surface height spatio-temporal signal and its derivative, as required for image reconstruction. Using both simulated and real data we show that our method outperforms the latest unsupervised image restoration approach. In addition, it provides an estimate of the water surface.
Senior LecturerThe Max Stern Yezreel Valley College
Murad M. Badarna received his B.Sc. in Information Systems from the University of Haifa, his M.Sc. in Computer Science from the University of Haifa, and his Ph.D. in Machine Learning from the University of Haifa. He is currently a member of the Department of Information Systems at both the University of Haifa and the Max Stern Yezreel Valley College, and he also serves as a lecturer in the Department of Industrial Engineering and Management at Braude College.
Dr. Badarna’s primary research interests lie in the field of machine learning, with a particular focus on selective sampling, active learning, and deep learning. In addition to his academic work, he is actively involved in the high-tech industry. He serves as the Head of the Research and Development Department at xBiDa, a company that provides a combination of advanced video analytics technology and data science services.
Scaling Convolutional Neural Networks for Tabular Data via Correlation-Based Image Transformations
Senior Lecturer (Assistant Professor)Ben-Gurion University
Dr. Yehuda Dar is a Senior Lecturer (Assistant Professor) in the AI Research Institutes of the Computer and Information Science Faculty at Ben-Gurion University. He and his research group work in the area of machine and deep learning. Yehuda holds a BSc in Computer Engineering, MSc in Electrical Engineering, and a PhD in Computer Science, all from the Technion. He had postdoctoral positions at the Technion, and in the Department of Electrical and Computer Engineering at Rice University. In addition to his research, Yehuda teaches machine learning courses at BGU and acts as area chair at leading machine learning conferences such as NeurIPS and ICML.
Machine Unlearning of Deep Neural Networks: The Effect of Overparameterization
MSc studentTechnion - Israel Institute of Technology
Pseudo-Invertible Neural Networks
VP R&DVisual Layer
Why Visual Data Breaks LLMs: Solving the 50M Token Problem
University of Haifa
A label-efficient active learning framework for medical image segmentation: from cold start to increased scale