14th Israel Machine Vision Conference (IMVC) 2025

Tuesday|April 1, 2025

Pavilion 10, EXPO Tel Aviv

Agenda

IMVC 2025 will feature presentations by leading researchers in AI, with a focus on image and video processing, computer vision, machine learning, and deep learning. Attendees can expect visionary insights and cutting-edge developments from both academia and industry, showcasing the latest trends in artificial intelligence and its applications.

 

Exhibition

IMVC is the premier platform for companies shaping the future of AI. Discover the latest advancements in machine vision and machine learning, and connect with experts, entrepreneurs, developers, and engineers. Join us in Tel Aviv to forge collaborations, explore ongoing trends, and witness new applications in the field.

Topics

Computer Vision | Deep Learning
Artificial Intelligence (AI) |
Generative AI (GenAI) | AutoML/MLOP
Data Augmentation | Data Fusion
Vision in Autonomous Systems |
Medical Imaging | Augmented Reality (AR) | Virtual Reality (VR)
Pattern Recognition | Computational
Photography | Robotics Embedded
Vision | Image Formation
Optics and Sensors | Metrology Vision
Mathematical Foundation of DL |
...and many more

Keynote Speaker

Yonina Eldar

The dept. of Mathematics and Computer ScienceThe Weizmann Institute of Science

Bio:

Prof. Yonina C. Eldar is renowned for her pioneering contributions to signal processing and sampling theory. She currently serves as a Professor in the Department of Mathematics and Computer Science at the Weizmann Institute of Science, where she holds the Dorothy and Patrick Gorman Professorial Chair and leads the Center for Biomedical Engineering.

Prof. Eldar's innovative research has significantly advanced sub-Nyquist sampling techniques, enabling more efficient data acquisition across various applications, including medical imaging and communications. Her work has garnered numerous accolades, such as the IEEE Signal Processing Society Technical Achievement Award and the IEEE Kiyo Tomiyasu Award, underscoring her impact on the field.

Beyond her research, Prof. Eldar is dedicated to education and mentorship, having authored several influential books and over 475 journal articles. She is a member of the Israel Academy of Sciences and Humanities and a Fellow of both IEEE and EURASIP, reflecting her esteemed standing in the scientific community.

Her leadership extends to various academic and professional committees, where she has played a pivotal role in shaping the future of signal processing and its applications. Prof. Eldar's commitment to advancing technology and fostering innovation continues to inspire researchers and practitioners worldwide

Title:

Model-based Learning

Abstract:

Nadav Cohen

CTO, President & Co-FounderImubit & Tel Aviv University

Bio:

Nadav Cohen is an Assoc. Prof. of Computer Science at Tel Aviv University, and CTO, President & Co-Founder at Imubit. His academic research centers on the foundations of deep learning, while at Imubit he leads development of deep reinforcement learning systems controlling manufacturing plants. Nadav earned a BSc in electrical engineering and a BSc in mathematics (both summa cum laude) at the Technion. He obtained his PhD (summa cum laude) at the Hebrew University, and was a postdoc in Princeton. For his contributions, Nadav won a number of awards, including an ERC Grant and a Google Research Scholar Award.

Title:

Offline Reinforcement Learning in the Wild

Abstract:

Ayellet Tal

ProfessorTechnion - Israel Institute of Technology

Bio:

Ayellet Tal is a professor and the Alfred and Marion Bär Chair in Engineering at the Technion's Department of Electrical and Computer Engineering. She holds a Ph.D. in Computer Science from Princeton University and a B.Sc degree (Summa cum Laude) in Mathematics and Computer Science from Tel Aviv University. Among Prof. Tal’s accomplishments are the Rechler Prize for Excellence in Research, the Henry Taub Prize for Academic Excellence, and the Milton and Lillian Edwards Academic Lectureship. Prof. Tal has chaired several conferences on computer graphics, shape modeling, and computer vision, including the upcoming ICCV.

Title:

Point Cloud Visualization – Why and How?

Abstract:

A point cloud, which is a set of 3D positions, is a simple, efficient, and versatile representation of 3D data. Given a point cloud and a viewpoint, which points are visible from that viewpoint? Since points themselves do not occlude one another, the real question becomes: which points would be visible if the surface they were sampled from were known? In this talk we will explore why point visibility is important, not only in computer vision but also beyond, how it can be determined, and in particular, how it can be addressed within optimization or deep learning frameworks.

Yovav Meydad

Chief Strategy & Product OfficerMentee Robotics

Bio:

Yovav Meydad is the Chief Strategy and Product Officer at Mentee Robotics, leading the company’s product vision and strategic growth in the humanoid robotics industry. Previously, he served as the Chief Growth & Marketing Officer at Moovit (acquired in 2020 by Intel) where he helped scale the world’s leading transit app to over 1.5 billion users globally. With 20+ years of experience as a founder and product leader in venture-backed and public companies, Yovav also co-founded Pixplit and Hitpad and held leadership roles at Snap.com, Spark Networks, and AOL. He holds a B.Sc. in Information Systems Engineering from Ben Gurion University.

Title:

Abstract:

Amit Bermano

Tel Aviv University

Bio:

Title:

Abstract:

Aya Soffer

IBM

Bio:

Title:

IBM Granite Vision – the journey to develop a large enterprise-focused Visual Language Model

Abstract:

IBM has recently released a new family of Open-Source LLMs - Granite 3, focusing on enterprise use cases. In this keynote we talk about the development of the Visual Language Model member of this family -  IBM’s Granite Vision model -  developed by IBM Research in Israel and in the US. We describe the process of developing Granite Vision, and share insights and innovations from the different steps in this journey, such as data collection and generation, model training and evaluation, enterprise trust and safety requirements, and more. Additionally, We describe IBM's vision and future outlook on Large Vision Models and general Multimodal Models.

Daphna Laifenfeld

NeuroKaire

Bio:

Title:

Abstract:

Dean Leitersdorf

Co-Founder & CEO Decart

Bio:

Dean Leitersdorf grew up between Israel, Switzerland and Silicon Valley. Dean completed his PhD at the Technion at the age of 23, while serving in Unit 8200, and later completed his postdoc at NUS Singapore.

Dean won the ACM PODC Dissertation Award in 2023, for the best PhD in distributed computing worldwide. Additional awards include three best student paper awards at PODC, and the פרס ביטחון ישראל from the IDF.

Dean serves as CEO of Decart, an efficiency-focused AI research lab, which he founded in 2023 with his cofounder, Moshe Shalev. Decart burst out of stealth in October 2024 with its demo, Oasis- a real-time, generative AI video game world. Decart aims to become the leading consumer AI company by helping users transform their imagination into visual reality, blending interactive, generative AI experiences into everyday life.

 

Decart’s innovation lies in its groundbreaking AI platform, which reduces the cost of running and training AI models by ten times—a development that has put the company on the radar of global tech giants. Their platform delivers real-time generative capabilities that include creating fully playable AI-generated video game worlds. This marks a transformative step forward in AI infrastructure.

 

Within its first year, Decart has garnered significant backing from industry giants like Sequoia Capital and Benchmark, reflecting its transformative potential. The company’s rapid growth and innovative technology position it as a serious contender alongside industry leaders like OpenAI. Dean’s vision and leadership have been pivotal in this success, cementing his role as a key figure shaping the future of artificial intelligence.

Title:

Turning Israel into a Global Leader in Building Foundation Models for Computer Vision

Abstract:

Speakers

Alon Faktor

Director of AI researchVimeo

Bio:

Alon Faktor is a Director of AI research at Vimeo, the world’s largest private video hosting platform. Alon works on innovative applications for video consumption and interaction aiming to make video content dynamically used to increase value for Vimeo customers. Alon’s research focuses on efficient multi-modal large language models and video-RAG techniques for video understanding and indexing. Alon holds a B.Sc. in Physics and Electrical Engineering from the Technion and a M.Sc. and PhD in Computer Science from the Weizmann Institute of Science.

Title:

From Passive Viewing to Active Dialogue: Redefining Video Consumption

Abstract:

At this lecture, we will present how Vimeo is harnessing the latest AI techniques to enable novel video viewing experiences. We will give a deep dive into our talk-to-video technology which utilizes RAG and LLMs on the video domain. We will demonstrate our approach to extract and index the multimodal video information such that it can be effectively utilized with the RAG architecture. We will also present several applications which we have built on top of this method such as video Q&A and library search.

Matan Levy

PhD candidate Hebrew University of Jerusalem

Bio:

A Computer Science Ph.D. candidate at the School of Computer Science and Engineering at the Hebrew University of Jerusalem, under the joint supervision of Prof. Dani Lischinski and Dr. Rami Ben-Ari

Title:

Gray-Box Fine-Tuning for Single Backbone Domain Experts

Abstract:

The emergence of foundational models has greatly improved performance across various downstream tasks, with fine-tuning often yielding even better results. However, existing fine-tuning approaches typically require access to model weights and layers, leading to challenges such as managing multiple model copies or inference pipelines, inefficiencies in edge device optimization, and concerns over proprietary rights, privacy, and exposure to unsafe model variants. In this paper, we address these challenges by exploring "Gray-box" fine-tuning approaches, where the model's architecture and weights remain hidden, allowing only gradient propagation. We introduce a novel yet simple and effective framework that adapts to new tasks using two lightweight learnable modules at the model's input and output. Additionally, we present a less restrictive variant that offers more entry points into the model, balancing performance with model exposure. We evaluate our approaches across several backbones on benchmarks for text-image alignment, text-video alignment, and sketch-image alignment. Our results demonstrate that, despite having limited access to the model, our Gray-box approaches achieve competitive performance with full-access fine-tuning methods.

Ofer Rosenberg

AI System/Software ArchitectQualcomm

Bio:

Since 2008, Ofer has been involved in Compute and Heterogeneous environments, contributing to the OpenCL Specification while working on CPU-based Compte and GPGPU. In 2016, he began focusing on AI at Qualcomm, when it was still called “Machine Learning”. He initially worked on the Snapdragon Neural Processing Engine (SNPE) before becoming the Software Architect for Qualcomm’s Cloud AI100. Over the past year, Ofer has concentrated on enabling Generative AI across the stack, from conversion tools to the newly created Genie (Generative AI Inference Engine) add-on to QAIRT SDK.

Title:

Generative AI on Mobile Devices: Challenges and Innovations

Abstract:

The session explores the advancements and challenges of implementing Generative AI on mobile devices. It starts by emphasizing the benefits of running Generative AI on mobile platforms, including enhanced privacy, personalization, performance, and offline support. It then describes the characteristics of a typical mobile platform, such as the Snapdragon 8 Elite, and the limitations it introduces, such as memory bandwidth and power/performance tradeoffs. The session highlights innovative solutions that enhance the performance and accuracy of models running on mobile devices, including quantization techniques, speculative decoding, and Low-Rank Adaptation (LoRA).

Danny Astmon

Founder & CEOCognata

Bio:

Danny is an expert in ADAS/Autonomous cars and deep learning, with a record of more than 20 years. Served as Harman’s Director of ADAS/Autonomous car and senior director of machine learning. He has co-founded two start-up companies, Picitup, and iOnRoad. Danny holds several United States utility patents and has created a pipeline of dozens of patent-pending applications. Danny is a graduate of the prestigious Israeli Defense Forces (IDF) Haman Talpiot program, where he served in the elite Unit 8200, and holds a B.Sc. degree in Physics from Tel-Aviv University.

Title:

Supervised Generative AI – Real Data Augmentation with DriveMatriX

Abstract:

Cognata introduces DriveMatriX, an advanced supervised generative AI platform designed to transform real-world images and videos into diverse training and validation datasets. By leveraging a foundation of real-world data, DriveMatriX maximizes coverage through controlled, repeatable, and predictable augmentations. The platform simulates complex environmental conditions—such as adverse weather, varying illumination, and dynamic urban landscapes—ensuring robust AI model performance. Integrated validation mechanisms streamline AI training workflows, improving precision, recall, and edge-case performance. DriveMatriX sets a new benchmark in generative AI by combining realism with control, accelerating development cycles while maintaining dataset fidelity and consistency.

Tamar Kashti

AI Hub Research Manager Weizmann Institute of Science

Bio:

Tamar Kashti is a seasoned leader in algorithms and AI with over 15 years of research experience across academia and industry. She specializes in deep learning, computer vision, and image processing. Tamar currently manages the AI Hub at the Weizmann Institute, where she oversees the AI internship program for students and research fellows and advances cutting-edge medical imaging projects. Dr. Kashti has led algorithmic teams at Landa Labs and HP Indigo, pioneering innovations in calibration and print technologies, earning 7 patents. She has authored 15 published papers and holds a Ph.D. in theoretical physics from the Weizmann Institute.

Title:

Deep Denoising of Multiplexed Mass-based Images: Supervised vs. Self-supervised

Abstract:

Multiplexed mass-based imaging (MBI) technologies offer transformative insights into cellular diversity but are often hindered by significant noise. This study presents a deep learning-based approach to denoise MBI data, comparing supervised and self-supervised methods. Both approaches effectively address noise-related artifacts, with the supervised method excelling after fine-tuning for specific datasets, while the self-supervised approach demonstrates strong generalization across diverse data. These methods significantly reduce manual effort, delivering high-quality, denoised images within minutes. By enhancing image usability and analysis efficiency, this automated solution accelerates the adoption of MBI technologies for researchers and clinicians in biological and clinical studies.

Ron Shapira Weber

Ph.D. Student Ben Gurion University

Bio:

Ron Shapira Weber is a  Ph.D. student at Ben Gurion University (BGU) at the Vision, Inference, and Learning (VIL) under the supervision of Dr. Oren Freifeld at the Computer Science Dept. His interest areas include time series analysis and computer vision, with applications to time series joint alignment and averaging, image registration, and video analysis. He did his master's in Cognitive Science at BGU as a part of VIL group, under Dr. Oren Freifeld, and of the Computational Psychiatry Lab under Dr. Oren Shriki. Between 2019-2021 he worked as an algorithm researcher at BeyondMinds. 

Title:

SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images

Abstract:

Unsupervised Joint Alignment (JA) of images faces high complexity, geometric distortions, and convergence to poor optima. Vision Transformers provide  powerful features but do not fully resolve these challenges, leading to reliance on expensive models with heavy regularization and extensive hyperparameter tuning. We propose the Spatial Joint Alignment Model (SpaceJAM), a compact architecture with ~16K trainable parameters requiring no regularization. Evaluations on SPair-71K and CUB demonstrate that SpaceJAM matches state-of-the-art performance while offering at least a 10x speedup. By setting a new standard for rapid, effective image alignment, SpaceJAM makes JA more accessible and efficient. Code is available at: https://bgu-cs-vil.github.io/SpaceJAM/

Rotem Benisty

MSc student Technion - Israel Institute of Technology

Bio:

Rotem is an MSc student in Electrical and Computer Engineering at the Technion, where she is jointly mentored by Dr. Moti Freiman from the Faculty of Biomedical Engineering and Prof. Moshe Porat from the Faculty of Electrical and Computer Engineering. She holds a B.Sc. in Electrical and Computer Engineering from the Technion. Rotem’s research focuses on advancing isotropic MRI restoration from anisotropic data through innovative generative multi-plane deep learning techniques. Alongside her research, Rotem works as an engineer in the networking department at Nvidia, bringing several years of practical industry experience.

Title:

SIMPLE: Simultaneous Multi-Plane Self-Supervised Learning for Isotropic MRI Restoration from Anisotropic Data

Abstract:

Magnetic resonance imaging (MRI) is essential for medical diagnostics but often produces anisotropic data with varying resolutions, limiting volumetric analysis and diagnostic precision. Current super-resolution methods primarily interpolate missing slices, relying on indirect mappings and limited isotropic data, without fully leveraging MRI's inherent 3D structure. We introduce "SIMPLE," a Simultaneous Multi-Plane Self-Supervised Learning approach for isotropic MRI restoration from anisotropic data. SIMPLE utilizes multi-plane clinical scans to enhance slice quality while addressing 3D structural information to generate realistic isotropic volumes. Experiments on brain and abdominal datasets demonstrate SIMPLE’s superiority over state-of-the-art methods, improving volumetric analysis, 3D reconstructions, and clinical diagnostics.

Idan Abudi

Algorithm Researcher 4M Analytics

Bio:

Idan Abudi is an Algorithm Researcher at 4M Analytics, a company specializing in providing cutting-edge AI solutions for mapping underground infrastructures on a large scale. Before joining 4M Analytics, Idan served as a researcher and commander at Unit 9900 for almost 9 years (mainly focusing on Geospatial  Intelligence, with expertise in geographic and visual data research). He holds an MSc in Financial Math from Bar Ilan University.

Title:

Utilizing aerial imagery for utility object detection in infrastructure mapping

Abstract:

Underground infrastructure includes critical utility systems like water, gas, electricity, and communication networks, as well as transportation tunnels and structural foundations. Accurate mapping of these hidden systems is essential to prevent costly damages and ensure effective project planning. At 4M Analytics, we introduce a groundbreaking solution that harnesses aerial imagery from planes, integrating state-of-the-art object detection with domain-specific innovations. Our innovative approach addresses significant challenges, such as object variability and resolution limitations.

Yael Sde-Chen

Algorithm DeveloperApplied Materials

Bio:

Yael is an Algorithm Developer at Applied Materials, driving cutting-edge solutions in the semiconductor process control field. She holds a M.Sc. in Electrical Engineering from Technion. Yael has 5 years of industry experience from Applied Materials, Amazon and the IDF, with focus on deep learning methods for semantic segmentation, data augmentation, and generative models.

Title:

Repair Blind Spots in Semantic Segmentation

Abstract:

Semantic segmentation of images is crucial in AI expert systems. The Unet network architecture is widely considered the most practical choice for CNN solutions in application-specific semantic segmentation.

Despite their success, the Unet architecture has inherent problems that prevent true shift-equivariance and single pixel-level accuracy, as we demonstrate below.

Here we propose alternative CNN architecture, offering higher accuracy and efficiency, which is specifically designed for mission-critical applications where each pixel is crucial, such as anomaly detection in medical imaging, industrial quality control, and more.

Rajaee Khateb

PhD Student Tel Aviv University

Bio:

I am an Electrical Engineering Ph.D. student at Tel-Aviv University, advised by Prof. Raja Giryes.

My research is in the area of Artificial Intelligence and it focuses on the usage of 3D data. My research goal is to explore new 3D representations, and their integration with Generative AI methods such as Diffusion Models. Our vision is to use this unique integration in order to take the current 3D solutions a step forward.

Previously, I obtained a B.Sc. (Cum Laude) and M.Sc. from the department of Computer Science at the Technion, where I was advised by Prof. Michael Elad. My M.Sc. thesis was about unfolding greedy sparse pursuit algorithms into deep neural networks.

Title:

TriNeRFLet: A Wavelet Based Triplane NeRF Representation

Abstract:

In recent years, the neural radiance field (NeRF) model has gained popularity due to its ability to recover complex 3D scenes. Following its success, many approaches proposed different NeRF representations in order to further improve both runtime and performance. One such example is Triplane, in which NeRF is represented using three 2D feature planes. This enables easily using existing 2D neural networks in this framework, e.g., to generate the three planes. Despite its advantage, the triplane representation lagged behind in 3D recovery quality compared to NeRF solutions. In this work, we propose the TriNeRFLet framework, where we learn the wavelet representation of the triplane and regularize it. This approach has multiple advantages: (i) it allows information sharing across scales and regularization of high frequencies; (ii) it facilitates performing learning in a multi-scale fashion; and (iii) it provides a `natural' framework for performing NeRF super-resolution (SR), such that the low-resolution wavelet coefficients are computed from the provided low-resolution multi-view images and the high frequencies are acquired under the guidance of a pre-trained 2D diffusion model. We show the SR approach's advantage on both Blender and LLFF datasets.

Bella Specktor Fadida

Lecturer University of Haifa

Bio:

Bella leads the Healthcare Computer Vision (hVision) lab in the Medical Imaging Sciences department at the University of Haifa. She earned her PhD from the Hebrew University, specializing in medical imaging, and previously spent seven years developing medical imaging algorithms at Philips. Bella serves as a board member of Women in MICCAI (WiM) and is the founder and organizer of the Machine Learning for Medical Imaging (MLMI) and Haifa Machine Learning meetups.

Title:

Fetal weight estimation using deep learning-based segmentation

Abstract:

This work presents a deep-learning method for whole-body fetal segmentation from MRI and evaluates its repeatability, reproducibility, and accuracy. A normal MRI-based fetal weight growth chart was created for the first time, and sensitivity in detecting fetal growth restriction (FGR) was assessed. Retrospective data from 348 fetuses were analyzed, with assessments for repeatability (n=22), reproducibility (n=6), and accuracy (n=7). The model achieved high segmentation accuracy (Dice=0.973), strong agreement with the gold standard, and reliable fetal weight estimation. The MRI-based growth chart was consistent with ultrasound charts and accurately identified FGR cases, demonstrating its potential for fetal growth assessment.

Georgy Melamed

Principal ScientistWalmart Global Tech

Bio:

Georgy Melamed is a principal scientist at Walmart Global Tech Israel. He has over 20 years of experience in CVML domains at military industry, startups and corporate. He holds an M.Sc. in Electrical Engineering from Tel Aviv University and is an alumni of Talpiot program.

Title:

Generative video virtual try-on

Abstract:

Visualizations are critical in e-commerce, both for marketing and customer-vendor expectations matching. The particular use case of garment virtual try-on underwent a tremendous growth in the past decade, accelerating exponentially with the present leap of generative models. The specific differentiator between an impressive presentation and real-life solution is trustworthiness of the result, mitigating customer's risk during remote purchase.

Video is the next frontier for virtual try-on. In this talk we present an evolution of Zeekit's historic SOTA static solution, acquired by Walmart several years ago, into a contemporary video try-on feature, leveraging recent technology.

Sharon Peled

M.Sc. student Technion - Israel Institute of Technology

Bio:

Sharon is an M.Sc. student at the Technion's Data and Decisions Faculty, co-advised by Dr. Moti Freiman and Dr. Yosi Maruvka. 

His research focuses on developing Multiple Instance Learning (MIL) models for gigapixel histopathological images (Whole Slide Images), with an emphasis on creating scalable solutions for clinical diagnostics.

In addition to his academic work, Sharon has several years of experience in signal processing and machine learning research and currently serves as a senior lead researcher at the IDF.

Title:

PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification

Abstract:

Whole slide images are gigapixel-sized digital scans of tissue samples, designed to capture intricate cellular and morphological patterns. 

Their immense size necessitates division into smaller tiles, typically analyzed as instances within a Multiple Instance Learning (MIL) framework.

However, this formulation often overlooks spatial relationships between instances, which are crucial for capturing complex tissue structures during a histopathological examination.

To this end, we present a novel attention-based MIL framework that utilizes a probabilistic interpretation of self-attention to dynamically infer spatial dependencies during training.

Our approach eliminates the need for predefined spatial assumptions, enabling flexible spatial modeling and achieving state-of-the-art performance.

Rami Ben-Ari

Principal Research Scientist OriginAI

Bio:

Rami Ben-Ari is a Principal Research Scientist and Technical Lead at OriginAI and an Adjunct Professor at Bar-Ilan University in CS-EE Faculty. Actively engaged in the academic community, he co-supervises MSc and PhD students and has authored over 50 papers, along with numerous patents. His research focuses on deep learning techniques for image retrieval, multimodal learning, video understanding, and generative models. He holds a PhD in Applied Mathematics from Tel-Aviv University, specializing in computer vision.

Title:

Fast image inversion and editing with diffusion models

Abstract:

Emerging diffusion models have demonstrated impressive capabilities in generating images from textual prompts and sampled random noise, commonly referred to as a seed. A common approach to edit a real image involves adjusting the prompt while evaluating its corresponding seed, a process known as image inversion.

In this talk, I will introduce a novel and efficient solution for image inversion and editing with diffusion and flow matching models, using the well-known Newton-Raphson numerical scheme. Our method, Guided Newton-Raphson Inversion (GNRI), enables high-quality reconstructions and edits, outperforming existing inversion and editing techniques in both accuracy and speed. Notably, GNRI requires no model training, fine-tuning, prompt optimization, or additional parameters. This work has been accepted for publication at ICLR 2025.

Ben Vardi

PhD studentReichman University

Bio:

Ben is a Computer Science PhD student at Reichman University, supervised by Professor Ariel Shamir. His research focuses on the robustness and limitations of vision-language models. Previously, he worked as a computer vision engineer at Snap. He holds an MSc in Computer Science from Ben-Gurion University and a BSc in Biology and Psychology with an emphasis on Neuroscience from Tel Aviv University.

Title:

CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering

Abstract:

Recent Vision-Language Models (VLMs) excel in tasks like multiple-choice Visual Question Answering (VQA) but often struggle with unanswerable questions, providing incorrect responses to irrelevant queries. We propose CLIP-UP, a lightweight method leveraging CLIP-based alignment to equip VLMs with the ability to detect and withhold answers to unanswerable multiple-choice VQA. By training a few additional layers while preserving the original weights, CLIP-UP applied to LLaVA models achieves state-of-the-art performance on the MM-UPD benchmark for unanswerable question detection.

Nizan Mashall

M.Sc studentTechnion - Israel Institute of Technology

Bio:

Nizan Mashall is a mechanical engineer and a master’s student in the Technion’s Autonomous Systems and Robotics program. He previously worked at Rafael and BeautAI. Under the joint supervision of Prof. Miriam Zacksenhouse and Prof. Erez Karpas, Nizan develops artificial intelligence for autonomous robotic capabilities. His current work on affordance grounding in 3D space was accepted to the AAAI Bridge Program for Foundation Models and Robotics (PLAN_FM).

Title:

From Semantic Understanding to Geometric Features: Using Foundation Models for Novel Robotic Tasks

Abstract:

Foundation Models possess implicit knowledge of objects and their use. Our system leverages this to detect geometric features (vertices, edges, planes) and define task-specific coordinate systems, enabling model-based planners to execute manipulation tasks without task-specific training. Unlike current vision-language approaches requiring demonstrations or fine-tuning, our fully automated pipeline integrates five pre-trained foundation models. It generates digital twins, identifies key geometric features, and transfers these detections to real-world object point clouds. This approach achieves robust zero-shot performance across diverse objects and tasks, advancing adaptable robotic systems for autonomous manipulation in dynamic environments.

Oran Shayer

AI Research LeadAppsFlyer

Bio:

Oran has over a decade of experience in machine learning, computer vision, and AI. He has a track record in both pure and applied research, leading AI research groups and managing end-to-end research to production processes. Prior to AppsFlyer, Oran worked at startups as well as large corporations (Apple, GM, Intel). Oran holds a B.Sc. and M.Sc. in electrical engineering from the Technion.

Title:

Designing VLM-Based AI Agents for Large-Scale Video Analysis

Abstract:

Dive deep into innovative architectures for building vision language models (VLMs)-based AI agents for large-scale video processing pipelines. By reflecting on key aspects of VLMs' reasoning and processing mechanisms, we propose several agent architectures designed to fully leverage their capabilities. Through practical examples, we'll explore strategies to maximize VLMs' visual understanding capabilities and design production-grade performance VLM-centric agents, while also suggesting more cost-effective alternatives.

Ben Fishman

Ph.D Candidate Bar Ilan University

Bio:

Ben Fishman is an AI & Algorithms researcher and manager with 11 years of experience in both industry and academia. Today he is a Computer Science Ph.D. candidate at Bar Ilan University, supervised by Prof. Gal Chechik and Dr. Idan Schwartz focusing on Generative AI & Multi Modal. Prior to that he served as a Director of AI & Algorithms at Microsoft and as a team leader at Mobileye.

Ben’s main areas of interest include Computer Vision, Speech & Audio, Deep Learning, and LLMs. He holds an M.Sc. in Electrical Engineering and a B.Sc. in Biomedical Engineering both from Tel Aviv University.

Title:

Bringing AI to Production

Abstract:

Today 85%-90% of AI projects fail to become a real-world application. In this session, we will explore the end-to-end lifecycle of an AI project, focusing on the challenges encountered at each stage and how to address them.  

Reaching production involves much more than training a deep learning model and therefore we will highlight best practices in other crucial areas: Data management (collection, annotation, curation, and engineering), infrastructure (MLOps, DevOps, tools) for an efficient and smooth development process, and algo-evaluation for an effective process with insights for the DS team.

The session will provide actionable insights to help AI projects succeed in delivering real-world solutions.

Omri Azencot

Senior LecturerBen-Gurion University of the Negev

Bio:

Omri Azencot is a Senior Lecturer (Assistant Professor) in the Department of Computer Science at Ben-Gurion University of the Negev, Israel. His research focuses on deep learning, sequence models, and dynamical systems. He completed his postdoctoral research at the Department of Mathematics at the University of California, Los Angeles, under the mentorship of Prof. Andrea Bertozzi. Omri earned his PhD in Computer Science from the Technion – Israel Institute of Technology, where he was advised by Prof. Mirela Ben-Chen. Prior to that, he obtained dual BSc degrees in Computer Science and Mathematics from the Technion. His research has been supported by an ISF grant, the Adams Fellowship from the Israel Academy of Sciences and Humanities, a Zuckerman Postdoctoral Fellowship, and a Marie Sklodowska-Curie Actions International Fellowship.

Title:

Breaking Barriers in Time Series Generation: From Koopman Dynamics to Image-Based Models

Abstract:

Generating realistic time series data is crucial for various scientific and engineering applications. While generative adversarial networks (GANs) dominate this field, they suffer from instability and mode collapse. Variational autoencoders (VAEs), though more stable, remain underexplored. In this talk, I will introduce Koopman VAE (KoVAE), a novel VAE-based framework inspired by Koopman theory, which enables robust generation of both regular and irregular time series. Additionally, I will present a new paradigm that transforms time series into images, allowing the use of powerful diffusion vision models. Our approaches achieve state-of-the-art results across multiple benchmarks, bridging short- and long-range sequence modeling.

Ido Sobol

MSc StudentTechnion - Israel Institute of Technology

Bio:

Ido Sobol is an MSc student in Computer Science at Technion, supervised by Prof. Or Litany. His research focuses on 3D computer vision and generative AI, with a particular emphasis on exploring the internal mechanisms of diffusion models.

Title:

Diffusion Models in the 3D Space: From Robust Inference to New Training Procedures

Abstract:

Diffusion models have dominated generative applications. However, they still face challenges. This talk explores two key challenges:

1.Generation Artifacts: Image-based diffusion models, used for 3D tasks as Novel View Synthesis, often produce artifacts. To address these artifacts, we introduce “Zero-to-Hero, a training-free attention filtering mechanism, enhancing quality and condition-enforcement.

Training Challenges: Training diffusion models requires large-scale datasets of the target modality, but 3D data is scarce. In our work “A Lesson in Splats”, we propose a novel training strategy that decouples the denoised modality from the supervision modality, enabling to train 3D diffusion models using only 2D supervision.

Guy Singer

Algorithmic Research LeadVisual Layer

Bio:

Guy Singer leads the research team at Visual Layer, where he develops novel algorithms for exploring, processing, cleaning, and enriching massive visual datasets. He also researches the theory of learning in neural networks at Tel Aviv University. Prior to this, he worked at the KLA Corporation, the University of Colorado Boulder, and the University of California Santa Barbara.

Title:

Garbage In, Garbage Out: How Label Noise Affects Vision Models, and What To Do About It

Abstract:

Label noise in computer vision datasets can have detrimental effects on model training, with larger datasets being needed to achieve the same post-training accuracy. We present evidence of the high prevalence of mislabels in standard academic datasets, a review of strategies for detecting and coping with these mislabels, and a release of open-source versions of these datasets that have been cleaned of their mislabels. As part of this effort, we also present LabelRank: a novel and state-of-the-art method for detecting mislabels in a distribution-agnostic manner.

Ido Marcus

Director of Research Trigo

Bio:

Ido, a member of Trigo’s founding team, manages the company’s interdisciplinary research group. Trigo specializes in computer-vision based solutions for physical retail stores, such as category leading autonomous checkout and theft detection systems. Ido’s group develops cutting edge AI in-the-wild algorithms, serving millions of shoppers around the world, at scale and in real-time.

Ido’s background as a researcher and a research engineering manager spans various fields such as deep learning, multiple-view geometry, and tracking algorithms. He holds BSc and MSc degrees in physics from the Hebrew University of Jerusalem and the Weizmann Institute of Science.

Title:

Anonymized tracking of hundreds of people across thousands of cameras, online, in autonomous retail stores

Abstract:

At Trigo we developed the world’s most advanced fully autonomous store, serving shoppers with accurate receipts in real-time. The underlying problem, namely to track all people and “understand” their interactions with products, is complex, multi-tiered and multi-faceted. Hence, solving it requires harnessing the latest advancements and exploring new frontiers across multiple algorithmic disciplines.

In my talk I’ll touch upon the various components at play, and focus on our unique multi-view multi-person tracking system, which sets a new state of the art in terms of scale and accuracy, is robust to changing environments, and adheres to strict privacy regulation.

Roie Cohen

Senior Algorithm Developer Align Technology

Bio:

Roie Cohen is a Senior Algorithm Developer at Align Technology.

His work focuses on identifying optimal NIR images for dental caries detection by combining optical-geometrical insights with traditional and AI-based computer vision techniques.  
Roie earned his PhD in physics from Tel Aviv University, where he was honored with multiple awards for his biophysical research in cellular mechanics.

His groundbreaking work has significantly advanced treatments for hearing deficiencies.

Align Technology, an S&P 500 company, leads the global dental industry with its Invisalign clear aligners and iTero 3D intraoral scanners, revolutionizing dental treatment through advanced 3D imaging and CAD/CAM technologies.

Title:

Combining NIR imaging with cutting-edge AI for non-radiation diagnostics

Abstract:

Screening for dental caries is a common practice aimed at early detection and prevention of invasive treatments. Using non-ionizing-radiation methods like Near-infrared (NIR) imaging eliminates the exposure to harmful X-rays while allowing accurate diagnosis.

NIR imaging leverages the teeth's partial transparency to visualize carious lesions. However, it faces challenges like internal and specular reflections that deteriorates image quality.Our innovative solution combines classical methods with machine learning, using geometrical optics and image data.

This approach produces high-contrast, clear NIR images, enabling quick and precise diagnoses.

This allows for frequent, radiation-free patient monitoring with minimal inconvenience, revolutionizing dental diagnostics.

Andrey Gurevich

AI Algorithms Principal Engineer Mobileye

Bio:

Andrey Gurevich is an AI Algorithms Principal Engineer at Mobileye, developing computer vision systems for autonomous driving. He leads research on LLM-based object detection, unsupervised knowledge distillation, efficient fine-tuning and efficient real-time DNN architectures, driving high-impact AI projects. Andrey holds an MSc in Electrical and Computer Engineering from Ben-Gurion University, where he researched sequential anomaly detection under Prof. Kobi Cohen.

Title:

Computationally Efficient Transformer for Autonomous Driving

Abstract:

Autonomous driving requires efficient processing of multi-modal, high-dimensional data. Transformer models, while powerful, face computational bottlenecks, particularly in attention mechanisms and softmax operations. This work introduces two novel architectures: Linear Perceiver, which integrates No-Softmax Linear Attention with dimensionality reduction, and STAT (Sparse Typed Attention), which hierarchically processes multi-camera inputs using learnable latent tokens. These approaches significantly reduce complexity, achieving up to 97% fewer softmax operations and up to 20× speedup over traditional transformers, with minimal performance trade-offs. The proposed methods are scalable across hardware platforms and extendable beyond automotive applications.

Roni Goldshmidt

Senior AI ResearcherNexar

Bio:

Roni Goldshmidt is a Senior AI Researcher at Nexar, specializing in vision-language models (VLMs) and diffusion-based video generation. His work focuses on developing interpretable and efficient AI models, leveraging large-scale datasets to improve predictive accuracy and real-time decision-making. He conducts research in interpretability and multimodal learning, aiming to enhance transparency and usability in AI-driven perception systems. His work bridges academic research and practical AI applications, contributing to advancements in multimodal learning and AI-driven perception.

Title:

PixelSHAP: Extending TokenSHAP for Vision-Language Models (VLMs)

Abstract:

Building upon the TokenSHAP framework for interpreting Large Language Models (LLMs), we extend this methodology to Vision-Language Models (VLMs). Our approach, PixelSHAP, applies Monte Carlo Shapley value estimation to image data, treating different regions as token groups. This enables interpretable heatmaps that highlight the most influential visual elements in a model’s decision-making process. Evaluations on image captioning, visual question answering, and zero-shot classification tasks demonstrate PixelSHAP’s effectiveness in improving VLM interpretability. By offering structured insights into model attention mechanisms, PixelSHAP facilitates debugging, bias analysis, and enhanced trust in multimodal AI systems.

Daniel Winter

Research EngineerGoogle & The Hebrew University of Jerusalem

Bio:

Daniel is a Research Engineer at Google, where he focuses on advancing generative image models and developing innovative editing capabilities. He holds an M.Sc. in Computer Science, supervised by Prof. Yedid Hoshen, and a B.Sc. in Mathematics and Computer Science, both from The Hebrew University of Jerusalem.

Title:

Supervised Image Editing with Diffusion Models

Abstract:

Diffusion models have revolutionized image editing but often struggle to follow user intent or produce photorealistic edits. We argue that the next performance leap will come from high-quality, carefully curated supervision designed to simulate real-world scenarios. First, we introduce ObjectDrop, an object removal model trained on a small counterfactual dataset of image pairs captured before and after removing an object. Second, we present ObjectMate, a subject-driven generation and object insertion method achieving state-of-the-art photorealism with exceptional identity preservation. To create paired data, we leverage the observation that, in large unlabeled datasets, many objects reappear in diverse scenes and poses. These works showcase the power of supervised image editing.

Ofir Hadar

AI team leaderDeePathology

Bio:

Ofir Hadar is an AI team leader and applied researcher with a strong background in computer science, holding a master's degree in the field. With expertise in computer vision, deep learning, and machine learning, he brings a wealth of knowledge to his role. Ofir has vast experience in AI for computational pathology, working on many different problems from diagnostics to personalized health care and pharmaceuticals.

Title:

Segmentation by Factorization: Unsupervised Semantic Segmentation for Pathology by Factorizing Foundation Model Features

Abstract:

We introduce Segmentation by Factorization (F-SEG), an unsupervised segmentation method for pathology that generates segmentation masks from pre-trained deep learning models. F-SEG allows
the use of pre-trained deep neural networks, including recently developed pathology foundation models, for semantic segmentation. It achieves this without requiring additional training or fine-tuning, by
factorizing the spatial features extracted by the models into segmentation masks and their associated concept features. We create generic tissue phenotypes for H&E images by training clustering models
for multiple numbers of clusters on features extracted from several deep learning models on TCGA [1], and then show how the clusters can be used for factorizing corresponding segmentation masks using off-the-shelf deep learning models. Our results show that F-SEG provides robust unsupervised segmentation capabilities for H&E pathology images, and that the segmentation quality is greatly improved by utilizing pathology foundation models. We discuss and propose methods for evaluating the performance of unsupervised segmentation in pathology.

Leah Bar

Senior ResearcherAutobrains & Tel-Aviv University

Bio:

Leah Bar holds a B.Sc. in Physics, an M.Sc. in Biomedical Engineering, and a Ph.D. in Electrical Engineering from Tel Aviv University. She completed her postdoctoral fellowship in the Department of Electrical Engineering at the University of Minnesota. Currently, she is a senior researcher at Autobrains, and also a researcher in the Applied Mathematics Department at Tel Aviv University. Her research interests include machine and deep learning, image processing, computer vision, and inverse problem.

Title:

Active Learning via Classifier Impact and Greedy Selection for Interactive Image Retrieval

Abstract:

Active Learning (AL) is a user-interactive approach aimed at reducing annotation costs by selecting the most crucial examples to label. Although AL has been extensively studied for image classification tasks, the specific scenario of interactive image retrieval has received relatively little attention. This scenario presents unique characteristics, including an open-set and class-imbalanced binary classification, starting with very few labeled samples. We introduce a novel batch-mode Active Learning framework named GAL (Greedy Active Learning) that better copes with this application. It incorporates new acquisition functions for sample selection that measure the impact of each unlabeled sample on the classifier. We further embed this strategy in a greedy selection approach, better exploiting the samples within each batch.  We evaluate our framework with both linear (SVM) and non-linear MLP/Gaussian Process classifiers. For the Gaussian Process case, we show a theoretical guarantee on the greedy approximation. Finally, we assess our performance for the interactive content-based image retrieval task on several benchmarks and demonstrate its superiority over existing approaches and common baselines.

A Taste from IMVC 2024