Or Kozlovsky

Senior AI Applied Researcher Bluewhite Robotics

Bio:

Bio:

Noa Cahan is a PhD candidate in Electrical Engineering at Tel Aviv University, advised by Prof. Hayit Greenspan. She holds both a BSc and an MSc in Electrical Engineering from Tel Aviv University. Noa's research focuses on deep learning and computer vision in medical imaging, with a particular interest in integrating diverse data modalities such as imaging, free text, and structured tabular data for medical prognosis, as well as developing cross-modal translation models using generative AI. Noa has been awarded ISF research grants, and her work has been published in leading journals such as Scientific Reports and NPJ Digital Medicine, and presented at top conferences including MICCAI and NeurIPS. Prior to her PhD, Noa worked at Amazon and Qualcomm.

Title:

Leveraging Diffusion Models towards PE Early Diagnosis using CXRs

Abstract:

Patients with respiratory issues in emergency rooms typically undergo chest X-rays (CXR), which are accessible and low-cost but provide limited low-resolution imaging. Higher-risk patients are referred for more detailed and expensive CT or Computed Tomography Pulmonary Angiography (CTPA) scans, which involve higher radiation. The study focuses on detecting Pulmonary Embolism (PE), usually invisible in CXR but detectable in CTPA.

By leveraging paired CXR–CTPA data, we investigate two complementary diffusion-based strategies that transfer diagnostic knowledge from the high-fidelity CTPA modality to the widely available CXR domain. In the first, a conditional diffusion model
is trained to generate 3D CTPA - like representations directly from 2D CXRs, enriching the initial imaging with high-resolution vascular cues and improving PE detection performance from 69% to 80% AUC. In addition, we introduce a latent-space
diffusion prior that performs cross-modal knowledge distillation, generating CTPA-informed classifier embeddings from CXR embeddings without explicit image synthesis, enabling state-of-the-art PE classification using CXR alone.
Together, these approaches demonstrate that diffusion models can act as powerful cross-modal bridges, either through image generation or embedding level supervision, substantially enhancing early PE diagnosis from CXRs while reducing reliance on
expensive and high radiation imaging. Although not a replacement for clinical CTPA, this framework highlights a scalable and generalizable pathway for augmenting low-cost imaging with high-level diagnostic insight.
Our contributions through these works are as follows: (1) First true CXR→CTPA diffusion pipeline with diagnostic validation; (2) A novel 1D-diffusion prior for CXR→CTPA embedding distillation; (3) State-of-the-art CXR-based PE classification; (4) Modality-agnostic framework extendable to other cross-modal imaging tasks, facilitating wider access to advanced diagnostic tools.

Ran Itay

Algorithm DeveloperApplied Materials

Bio:

Ran Itay is an algorithm developer at Applied Materials, working in the Process Diagnostics and Control business unit. He holds a Ph.D. in physics from the Department of Particle Physics and Astrophysics at the Weizmann Institute of Science in Israel. Ran began working in machine learning during his postdoctoral research at the Stanford Linear Accelerator Center (SLAC) in California, USA, where he led the deep learning group in the MicroBooNE experiment, in the field of neutrino physics. In his current role, Ran focuses on developing deep learning and classical algorithmic solutions across various domains, including metrology, defect detection, and physical simulation

Title:

Warp and Render: A Dual-Network Framework for Geometry-Controlled Simulation in Semiconductor Process Diagnostics

Abstract:

SEM images used in semiconductor manufacturing pose significant challenges for vision models because labels are scarce, geometric accuracy must be maintained at sub-pixel levels, and the domain gap from natural images is substantial. To address these limitations, we introduce Warp and Render, a dual-network framework that separates geometric structure from visual appearance and enables controlled, design-guided simulation. A deformation-prediction network aligns a reference layout to the observed image, and a rendering network generates realistic SEM-like appearance from the aligned geometry. The approach generalizes across diverse pattern types and imaging conditions, remains effective in low-data regimes, and preserves strong geometric consistency, supporting high-accuracy industrial SEM-image analysis.

Tom Hirshberg

Data Scientist Microsoft

Bio:

Tom Hirshberg is a data scientist at Microsoft in the Edge AI group, where she develops multimodal AI systems for large-scale video understanding. Previously, she was a research intern at Microsoft in Redmond, focusing on optimization and control methods for autonomous robotic systems.

Tom holds a BSc and an MSc (cum laude) in Computer Science from The Technion. During her studies, she was part of the algorithm team that developed the first student’s autonomous formula race car at The Technion. Her master’s thesis explored acoustic-based indoor localization for drones, bridging signal processing, machine learning, and robotics.

Title:

Object Detection and Tracking in Live Streams Using Textual and Visual Detailed Descriptions

Abstract:

In the live video analysis domain, everything must happen quickly, efficiently and accurately. While traditional object detection systems rely on predefined classes, modern applications require flexibility to describe, detect, and track any object in live video streams. This brings algorithmic and computational challenges, especially for edge devices, like handling detailed attributes (e.g., “a red vintage car”), integrating specialized trackers, and managing high camera loads efficiently.

This lecture presents an algorithm for detecting and tracking objects in live video streams using detailed textual description, image examples or both. Our approach is already successfully implemented in Microsoft’s Azure AI Video Indexer.

Eli Schwartz

Research Manager IBM Research

Bio:

Dr. Eli Schwartz is Research Manager of Multimodal AI at IBM Research. His research focuses on vision-language foundation models and learning with limited data. Eli earned his PhD from Tel Aviv University and has authored more than 30 papers and more than 10 patents. Before joining IBM Research, he co-founded Inka Robotics, working on autonomous robotics, and worked at Microsoft developing computer vision algorithms for AR/VR.

Title:

Adaptive Resolution Processing in Vision-Language Models

Abstract:

Modern vision-language models face a fundamental accuracy-efficiency trade-off with high-resolution inputs. This talk presents four approaches to adaptive resolution across two architectural paradigms. For contrastive encoders, WAVECLIP enables coarse-to-fine processing via wavelet tokenization with early exits, while CLIMP uses Mamba architectures for natural variable-resolution support. For decoder-based VLMs, CARES predicts minimal sufficient resolution with a lightweight preprocessor (80% compute reduction), while ZoomCall trains models to selectively fetch high-resolution crops via tool-calling and reinforcement learning. These complementary strategies—progressive refinement, learned preprocessing, and agent-based reasoning—enable dynamic accuracy-efficiency trade-offs within deployed models.

Or Grenberg

AI researcher General Motors

Bio:

I am a PhD candidate at The Hebrew University of Jerusalem, advised by Prof. Dani Lischinski, and a Senior Researcher at General Motors. My research focuses on image and video generation and manipulation, with a particular interest in adverse viewing conditions and out-of-distribution (OOD) concepts, primarily related to automotive scenarios.

Title:

Seed-to-Seed: Unpaired Image Translation in Diffusion Seed Space

Abstract:

Abstract:

Self-supervised video foundation models have recently shown strong transferability across natural video tasks, yet their applicability to domains with radically different spatiotemporal statistics remains largely unexplored. We investigate whether long-video masked autoencoders (LV-MAE), originally designed for low-frame-rate semantic natural videos, can be adapted to high-speed coherent imaging that lacks semantic structure. We apply LV-MAE to speckle-pattern video^2-3 recordings captured at 1000 fps from the scalp overlying language-related cortex during silent speech tasks. Despite extreme differences in resolution, modality, and temporal scale, LV-MAE learns transferable representations that enable accurate downstream classification with minimal labeled data. Using leave-one-subject-out evaluation with one-minute subject-specific calibration, the proposed approach achieves strong cross-subject performance on millisecond-scale inputs. These results suggest that masked video representation learning can generalize beyond natural video, enabling efficient learning in specialized high-speed imaging domains.

Abstract's Co-Authors -

Natalya Segal^a,*, Daniel Rubinstein^b, Moshe Bar^c, Zeev Zalevsky^a

Bio:

Oshri Naparstek is a Senior Research Scientist at IBM Research – Haifa, working on multimodal AI, document understanding, retrieval, and vision-language models.

His recent work focuses on multimodal retrieval-augmented generation, modality gaps in embedding spaces, late-interaction retrieval, and efficient multimodal representation learning.

Previously, he worked on OCR, key-value extraction, and document AI benchmarks, including KVP10k and related document understanding systems.

Oshri holds a PhD in Electrical Engineering and has a background in applied mathematics, distributed algorithms, wireless communications, and machine learning.

His research interests include representation learning, multimodal reasoning, and practical AI systems for real-world information access.

Title:

Real-World Multi-Modal RAG: Innovative Benchmarking and Efficient Visual Document Retrieval

Abstract:

Enterprise documents carry critical information in their visual layout, not just their text. As RAG systems evolve to handle these multi-modal documents, new challenges emerge around evaluation, retrieval quality, and production-scale efficiency.
In this talk, I will present our team's recent work across three directions. First, building realistic benchmarks for multi-modal RAG that reflect real enterprise needs. Second, training vision-language based document retrievers that capture layout and visual semantics beyond text extraction. Third, our findings on redundancy in multi-vector document representations and how this insight enables significantly more efficient retrieval at query time.

Tamir Denis

AI Vision Team LeadBigBear.ai

Bio:

Tamir is an AI and Computer Vision Team Lead at BigBear.ai, leading the research and development of AI-powered vision systems for customs and cargo inspection. He has over six years of experience in computer vision, deep learning, anomaly detection, and multimodal AI, with previous roles at Samsung and Percepto. His work has spanned drone inspection systems, industrial AI, advanced imaging sensors, and large-scale vision applications. Tamir holds an M.Sc. in Computer Science from Tel Aviv University, where his research focused on non-invasive blood analysis using deep learning from eye videos, resulting in a publication in npj Digital Medicine, a journal within the Nature Portfolio.

Title:

Blood tests without needles? Practical challenges of building a deep learning pipeline using eye videos

Abstract:

Routine blood counts still require a needle and a lab. Can a short video of the eye replace the draw? The bulbar conjunctiva, the white area of our eyes, exposes microvessels directly to a camera, and their appearance carries signal about what flows through them. We shortly walk through both sides of the problem: the core challenges of turning raw video into a calibrated medical estimate, and the deep learning pipeline we built to get there, from capture and vessel extraction to blood count prediction.

Published in Nature npj Digital Medicine. A collaboration between Sheba Hospital and Tel Aviv University, with Prof. Lior Wolf, Prof. Haim Suchowski, Dr. Ifat Sher, and Prof. Ygal Rotenstreich.

Ori Besen

Electrical EngineerTel Aviv University

Bio:

Ori Besen is a B.Sc. Electrical Engineering graduate at Tel Aviv University's Iby and Aladar Fleischman Faculty of Engineering, graduated in 2026. He conducted his final-year project, Multi-Omics 5: Multimodal Breast Cancer Survival Prediction, under the supervision of Prof. Ilan Tsarfaty at TAU's Faculty of Medicine, building a 13-microservice Kubernetes platform that integrates seven biological modalities — clinical, genomic, methylation, pathomics, and radiomics — for survival prediction.

His work focuses on the engineering and machine learning challenges of fusing high-dimensional medical data: GAN-based histological normalization, prototype-based slide encoding, and selective multimodal fusion. In parallel, he serves as a Control Engineer at BLEnergy, developing Python automation systems. His broader interests lie in computer vision, medical imaging, and production-grade ML systems.

Title:

Integrating Multi-Omics, Pathomics, and Radiomics: A Unified Platform for Breast Cancer Translation

Abstract:

Breast cancer's biological heterogeneity limits the accuracy of single-modality prognostic models. We present a unified platform integrating seven biological modalities — clinical, mRNA, miRNA, DNA methylation, copy number variation, pathomics, and radiomics — through a 13-microservice Kubernetes architecture for breast cancer survival prediction.

The pathomics pipeline combines TRIDENT tissue segmentation, GAN-based stain normalization, CTransPath feature extraction, and PANTHER prototype encoding into a 16-dimensional slide representation. GAN normalization extended usable WSI coverage by 23% (to 1,052 patients), and the compact 16D representation outperformed 768D CTransPath embeddings (C-index 0.80 vs. 0.74), demonstrating that signal-to-noise optimization can matter more than embedding dimensionality. The radiomics pipeline applies TotalSegmentator across anatomical masks, extracts ~93,000 PyRadiomics features per scan, and distills them through a three-stage funnel to a 1,200-dimensional cross-mask-attention representation.

On TCGA-BRCA (n=1,094), selective fusion of Clinical+DNAm+Pathomics achieved test C-index 0.910 — 17.9% above the clinical baseline — with IBS 0.124 and tertile risk stratification at logrank p=1.7×10⁻⁶. This three-modality combination outperformed full seven-modality fusion, indicating that selective integration beats exhaustive combination. Univariate analysis of PANTHER prototypes identified mucin/myxoid morphology as a top prognostic signal, providing biological validation. A pan-cancer MRI sub-study (n=306, 11 cancer types) achieved C-index 0.836, with prognostic signal emerging from systemic organ features rather than tumor regions — an underexplored direction in radiomics.

The modular architecture enables independent modality development and pan-cancer scaling.