Gershon Celniker

R&D Lab Group Manager General Motors

Bio:

Gershon Celniker is an R&D Lab Group Manager at GM, previously a Principal Data Scientist at Verint, Check Point and Chief Data Scientist at Wiser. He holds a BSc from Technion Institute and a MSc from Hebrew University in Bioinformatics and Machine learning applications with vast academic experience as a fellow CS researcher from Weizmann institute and Tel-Aviv University. Currently, his main areas of research interest lie in the design of AI and CV algorithms and their applications in the Automotive industry.

Title:

Understanding and modeling gaze patterns in the automotive environment

Abstract:

intuitively, when a person is relaxed and has no task to perform, one tends to look at salient objects in the field of view (bottom-up). As tasks are introduced and workload increases, one usually tends to select a more task-oriented gaze behavior (top-down) and a shift from salient objects-oriented gaze patterns to important objects-oriented gaze patterns can be observed. In the automotive environment, this shift between gaze pattern types and their linkage to the driver or passengers’ states suggests that modeling a gaze pattern can lead to an understanding of one’s state and vice versa.

Gaze patterns were modeled by training both deep learning networks and statistical models. Deep learning networks were trained to digest effectively larger datasets, and statistical models were selected for their simplicity and explainability. A set of experiments was conducted both in real-world setups and in simulated environments. The real-world experiments took place in Israel and the USA while modeling the behavior of drivers and passengers. Overall, our results supported our assumptions and can be divided into two types: prediction of expected gaze patterns given the environment and establishing a linkage between gaze patterns and the driver’s and passenger’s state.

Roy Orfaig

Blue white robotics and Tel-Aviv Unviersity

Bio:

Dr. Alona Strugatski Faktor

Postdoctoral fellowWeizmann Institute of Science

Bio:

Alona Strugatski-Faktor is a Postdoctoral fellow at the Weizmann Institute of Science. Alona's research focuses on cognitive capabilities of AI models and visual scene interpretation.
She is specifically interested in combining human vision research with state-of-the-art AI models. Alona holds a B.Sc. in Physics and Electrical Engineering from the Technion,

a M.Sc. in Electrical Engineering from Tel Aviv University and a PhD in Mathematics and Computer Science from the Weizmann Institute of Science.

Title:

Why does Visual-Language Models Struggle with Scene Structure Extraction

Abstract:

Though the huge breakthrough in vision-language models, they are still far from achieving human-level scene-understanding and have several fundamental limitations.
We show that these models are not able to perform simple tasks such as questions regarding locations and relations between objects. We suggest a model which can
naturally answer such questions and achieve scene-understanding even for complex scenes. It does this by using an iterative goal-driven approach that resembles
human vision. Our model is able to focus its attention in each iteration on the relevant parts of the scene and thus iteratively build a complex understanding of the scene.

Yochai Yemini

PhD studentBar-Ilan University and OriginAI

Bio:

Yochai Yemini is a PhD student at Bar-Ilan University, under the supervision of Prof. Sharon Gannot and Dr. Ethan Fetaya. He is also a deep learning researcher at OriginAI. His areas of interest include computer vision, speech processing and their intersection, and his current research focuses on deep learning methods for audio-visual tasks.

Title:

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

Abstract:

In the lip-to-speech task, the objective is to accurately generate the missing speech for a soundless video of a person talking. It is required, e.g., when the speech signal is completely obfuscated by background noises. In this talk, I will present LipVoicer, a novel approach for producing high-quality speech for in-the-wild silent videos. LipVoicer leverages the transcription of the speech we wish to generate as predicted by a lip-reading model, and a diffusion model conditioned on the video to generate mel-spectrograms. LipVoicer achieves exceptional results, and the generated speech sounds natural and synchronized to the lip motion.

Boris Greenberg

VP of XR SolutionsVoxelSensors

Bio:

Boris Greenberg leads the Spatial and Empathic Computing Solution team at VoxelSensors. He has over 21 years of expertise in multi-disciplinary R&D within the high-tech industry and academia. His previous roles include founding EyeWay Vision, I.C. Inside, and serving as an R&D lead in Automated Optical Inspection at Orbotech. Notably, Mr. Greenberg holds more than 20 patents and pursued his studies in physics at the Hebrew University of Jerusalem.

Title:

Low-Power, Low-Latency Perception for XR

Abstract:

XR devices immerse users in augmented realities, seamlessly merging digital and physical realms. Achieving this demands advanced perception technology that is resilient across environments, low in power usage, and has minimal latency. Yet, existing solutions struggle to meet this demanding combination of requirements, even with Apple’s Vision Pro setting a new standard for XR glasses’ 3D perception.

VoxelSensors’ Active Event Sensors (AES) enable robust, low-power, low-latency 3D sensing using laser triangulation. This innovation enhances SLAM, odometry, gesture recognition, and tracking, potentially revolutionizing augmented reality experiences. Boris will outline this groundbreaking approach and perspectives in XR 3D perception.

Ofer Lavi

CEOdataspan.ai

Bio:

With his long career in machine learning, Ofer has learned data is the biggest obstacle to implementing successful AI projects. His company, dataspan.ai, uses generative artificial intelligence to assist teams in creating better computer vision applications.

His last position was as a program manager for IBM Research AI, responsible for natural language processing and artificial intelligence for customer care. Prior to that, he managed IBM Research Haifa's Machine Learning Technology group. Bringing AI from research to production, he published more than twenty peer-reviewed papers and patents.

Title:

Can AI train AI?

Abstract:

This talk will demonstrate how Generative AI can be adapted to augment datasets for computer vision training. Utilizing diffusion models, we enhance dataset quality by seamlessly implanting concepts into images, improving downstream model performance. We provide an algorithmic framework for localizing the appropriate place for implanting a concept and for the actual generation of the concept given the background. To address the stochastic nature of diffusion models, which may generate images that do not contribute to the improvement of downstream models, we employ clustering and filtering, maximizing dataset relevance. We demonstrate the methods on both public and real world datasets.

Omri Danziger

Computer Vision ResearcherForesight Autonomous

Bio:

Bio:

Elad Levi is a machine learning engineer at Sightful, a startup that is creating the first AR laptop. His work focuses on leveraging multimodal inputs (in particular vision and language) in order to build a novel AR operating system. Elad received a PhD degree in mathematics from the Hebrew University. His thesis was in the field of model-theoretic with applications to combinatorics problems.

Title:

Democratizing Large Language Models

Abstract:

Large language models (LLMs) have emerged as a breakthrough technology, exhibiting remarkable performance across a wide range of tasks. Until recently, the development of LLMs seemed constrained by high barriers, resulting in a few companies dominating the field. However, recent advancements in the field have significantly lowered these barriers, enabling the development of high-quality LLMs with a limited amount of effort and computation resources.

In this tutorial, we will explore the challenges involved in building LLM models, the development that allows building such high-performance custom models with a small amount of resources, and the new possibilities it unlocks, including multimodal extension and expanded context windows.

Eyal Hanania

MSc student in the Electrical and Computer Engineering faculty Technion

Bio:

Eyal is currently pursuing his MSc in Electrical and Computer Engineering at the Technion, mentored jointly by Dr. Moti Freiman (Faculty of Biomedical Engineering) and Prof. Israel Cohen (Faculty of Electrical and Computer Engineering). His ongoing research is centered on creating deep-learning models with physical constraints for motion correction in medical imaging. Alongside his academic work, Eyal serves as an AI Research Intern at GE Research. He brings several years of industrial experience as an algorithm and computer vision engineer to his role. He earned his B.Sc in Electrical and Computer Engineering from the Technion.

Title:

Free-breathing myocardial T1 mapping with Physically-Constrained Motion Correction

Abstract:

T1 mapping is a quantitative MRI technique that has emerged as a valuable tool in the diagnosis of diffuse myocardial diseases. However, prevailing approaches have relied heavily on breath-hold sequences to eliminate respiratory motion artifacts. This limitation hinders accessibility and effectiveness for patients who cannot tolerate breath-holding. We address this limitation by introducing PCMC-T1, a physically-constrained deep-learning model that accounts for the signal decay along the longitudinal relaxation axis for motion correction in free-breathing T1 mapping. PCMC-T1 demonstrated superior results compared to baseline methods using a 5-fold experimental setup on a publicly available dataset of 210 patients.

In this talk, we will cover our latest work in the VL field, with topics such as Foundation Models for Expert Task Applications, Understanding Structured Vision and Language Concepts, and more.

Yair Adato

Co-founder & CEO BRIA AI

Bio:

Bio:

I hold a BSc and MSc in Physics with research in the field of Superconductivity.

I have 13 years of experience in the Semiconductors industry in leading companies like Intel, Nova, and Applied Materials filling various positions in data science and algorithm development.

Co-inventor of 10 patents in this field in machine and deep learning.

Currently, leading an algorithm group developing deep learning solutions for next-generation Anomaly Detection and Segmentation industry-specified solutions.

Title:

Generating the Perfect Reference: Anomaly Detection Via Fusion of Stochastic and Deterministic Learning

Abstract:

Defect Detection in the Semiconductor industry is an extreme case of Anomaly Detection comprised of very small anomalies often well-harmonized in the background pattern.

Although very similar, no reference sample is perfect for comparison due to production variation.

We propose to generate this perfect reference – a generated image counterpart that is identical to the input sample everywhere, but the defective area.

We are using a novel fusion of stochastic and deterministic learning to train a conditional generative deep VAE model.

We demonstrate perfect reference generation for MVTec dataset and silicon manufacturing Scanning Electron Microscope (SEM) images achieving industry SOTA results.

Dr. Yossi Rubner

CEORTC Vision

Bio:

Yossi Rubner serves as the CEO of RTC Vision, a company focused on the development and implementation of AI and Computer Vision technologies.

Rubner has combined industrial and academic roles for more than 25 years, and he is also the founder and CTO of Kitov.ai.

He earned his B.Sc. in Computer Engineering at the Technion Institute of Technology, followed by a Ph.D. in Computer Science and Electrical Engineering from Stanford University, specializing in Computer Vision.

Rubner is the author of more than 30 publications and patents and his contributions to the field were recognized in 2013 when the IEEE Computer Society awarded him the Helmholtz Prize.

Title:

From Spine Surgery to Body Identification: Computer Vision’s Role in Solving October 7th’s Forensic Challenges

Abstract:

Abstract:

The field of computer vision is in the midst of a generative revolution, demonstrating groundbreaking image synthesis results, portraying highly complex visual concepts such as objects’ interaction, lighting, 3D shape, and pose. Expanding this progress to videos introduces two key challenges: (i) the distribution of natural videos is vast and complex, requiring orders of magnitude more training data than images, and (ii) raw video data is extremely high dimensional, requiring extensive computation and memory. In this talk, I’ll present different methodologies aimed at overcoming these challenges and advancing our capabilities to synthesize and edit visual content across both space and time. These methods range from layered video representations tailored to a specific video, to leveraging generative image priors for video synthesis tasks, and finally, designing and harnessing large-scale text-to-video models, which provides us with powerful motion priors. I’ll demonstrate how these methods unlock a variety of novel content creation applications, such as transferring motion across distinct object categories, image-to-video synthesis, video inpainting, and stylized video generation.

Dr. Chen Sagiv

Co Founder & Co CEOSagivTech

Bio:

Chen Sagiv earned her PhD in Applied Mathematics from the Tel Aviv University focusing on variational methods and Gabor Analysis.

After working as algorithms developer, she became a parallel entrepreneur and co founded SagivTech, a computer vision projects company, DeePathology working in AI for computational pathology and SurgeonAI working on bringing AI to the OR.

Chen is also co founder of IMVC.

Chen is passionate about bringing technology to healthcare, promoting Math education to at risk youth and dogs.

Title:

Introduction to Transformers

Abstract:

Transformers are neural networks that learns context from relationships in sequential data using a mechanism called attention.

The modern transformer was proposed in the 2017 paper titled 'Attention Is All You Need' by Ashish Vaswani et al., Google Brain team.

While transformer models are basically large encoder/decoder blocks that process data they also have an attention ingredient that allows them to detect patterns in data.

In this session, a brief introduction to the foundations of transformers will be given.

Morris Alper

Tel Aviv University

Bio:

Morris Alper is a PhD student at the School of Electrical Engineering, Tel Aviv University (TAU). Under the mentorship of Dr. Hadar Averbuch-Elor, he is researching multimodal learning – machine learning applied to tasks involving vision and language. He received his MSc with honors from TAU (Computer Science), and his BSc from MIT (Mathematics and Linguistics).

Title:

Kiki or Bouba? Sound Symbolism in Vision-and-Language Models

Abstract:

Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism. Among the many dimensions of meaning, sound symbolism is particularly salient and well-demonstrated with regards to cross-modal associations between language and the visual domain. In this work, we investigate vision-and-language models such as CLIP and Stable Diffusion and find strong evidence that they do display sound symbolic patterns, paralleling the well-known kiki–bouba effect in psycholinguistics.

Shira Weinberg Harel

Product and AI Consultant

Bio:

With over two decades of experience in the tech industry, Shira has held prominent product and AI leadership roles at companies like Microsoft and monday.com. As a co-founder of LeadWith, a non-profit organization dedicated to empowering women in the tech field, she is passionate about promoting diversity and inclusivity in the industry. Shira’s contributions have been recognized by being selected for the esteemed 40 under 40 list by Globes. Currently, she works as an independent consultant and speaker, leveraging her expertise to mentor product managers and impart knowledge through her own academy. Additionally, her influential podcast serves as a platform for engaging discussions on all aspects of product management.

Title:

Artificial Intelligence, Real Biases: Examining Gender Biases in AI

Abstract:

Have you ever wondered why when you ask Midjourney for pictures of drivers, only images of men appear? Or why Siri's default voice is female?
The lecture explores the existence of gender biases in artificial intelligence and how they impact our understanding of reality.

As AI continues to permeate our lives, it's crucial to recognize that it is not always neutral. From Google Translate to cutting-edge generative AI platforms, this lecture will examine the gender biases present in various technologies. Through engaging examples, attendees will gain a deeper understanding of the issue and learn about available tools to address and create a more equitable future.

Dr. Itzik Ben Shabat

Research FellowThe Australian National University and Technion, Israel Institute of Technology

Bio:

Bio:

Bella is a lecturer in the Medical Imaging Sciences department at Haifa University. She completed her PhD at the Hebrew University under the supervision of Prof. Leo Joskowicz. Prior to that, Bella worked on medical imaging algorithms for 7 years at Philips. She is also the founder and organizer of the Machine Learning for Medical Imaging (MLMI) and Haifa Machine Learning meetups.

Solving 3D Human Pose Ambiguities with Quadratic Programming

Abstract:

3D human pose estimation (HPE) is a fundamental task in human-computer interaction. Monocular 3D HPE is a challenging task due to a lack of in-the-wild annotated data, high computational load, and accessibility to depth observations. Despite their success on 2D HPE, end-to-end DNN approaches to 3D HPE hardly generalize to in-the-wild scenes with multiple self-occlusions. Recent approaches suggested optimizing 3D humanoid model parameters to minimize a 2D objective, however, as they optimize in 2D, they suffer from depth ambiguities.

We propose a two-stage depth-based solution to monocular 3D HPE. We start by using a deep neural network to predict 2D body-joint locations and to classify joints as occluded or visible. Then, having valid depth for the visible joints, we solve a Quadratically Constrained Quadratic Program enforcing skeletal and temporal-continuity constraints and thereby solving the self-occlusion problem. We demonstrate our method effectiveness on Gentex’s in-cabin 180 degrees fisheye depth camera and show that it can reconstruct reliable 3D human pose in complex situations.

Zvi Figov

Principal Data ScientistMicrosoft

Bio:

Zvi Figov is a data scientist with over 20 years of experience in various computer vision fields. He currently works in the Azure Video Indexer group at Microsoft. He holds a BSc and MSc in computer science and mathematics from Bar-Ilan University. Zvi has vast experience in computer vision applications, including deep learning, object detection and tracking. Since joining the Video Indexer group 4 years ago Zvi has also been working on creating solutions based on multimodality analysis, combining vision, audio and NLP.

Title:

Person tracker for Media and Entertainment videos

Abstract:

Abstract:

Many of the tasks we face as data scientists or machine-learning researchers relate to categorization in one way or the other. In the words of David Mumford: "The world is continuous, but the mind is discrete." We often define categories when breaking down a real-world problem into an ML-based solution. However, actual target values may be continuous or at least ordered. This is something to consider and even leverage in the design of your ML model.

Title:

Using AI for 3D LiDAR perception in L3 Autonomous driving

Abstract:

In order to improve safety in autonomous driving it is important to use multiple sensors. In this talk we will discuss how to perform perception from 3D data acquired by the Innoviz lidar. Specifically we will describe how to identify obstacles or moving objects and perform detection in 3D.

Title:

How to enable running DL Networks in the car

Abstract:

Deep learning networks have emerged as a prominent technology for the accurate detection and identification of objects on the road, empowering the way to fully autonomous cars. In the pursuit of cost-effective and power-efficient solutions, this presentation delves into the challenges and strategies associated with optimizing deep learning networks for real-time execution in the context of vehicular environments.

Given the stringent constraints of cost-efficiency and low power consumption, a key challenge lies in achieving high performance without compromising accuracy.

Several optimization techniques are elucidated contribute to network efficiency, with a focus on minimizing computational redundancy and encouraging resource-efficient execution.

Through a compelling case study, the effectiveness of these optimization techniques is demonstrated in the real-world scenario of running a transformer network within an automotive system.

Tom Sharon

Master's student Weizmann Institute of Science

Bio:

Tom Sharon received the B.Sc degree in science physics with Summa Cum Laude, focusing on mathematics and physics from The Open University,Israel, in 2021. Presently, she’s completing her M.Sc. in mathematics and computer science from the Weizmann Institute, Rehovot,Israel, under Prof. Yonina Eldar’s supervision.

Her research interests include intersection between deep-learning and computer vision methods to physics challenges, including medical application. Her work focus on electromagnetic and acoustics signals for medical imaging using deep-learning methods such as model-based neural networks,and solving inverse scattering problems for quantitative imaging.

Her awards include the scholarship for excellence master’s degree students in high-tech fields.

Title:

Real-Time Model Based Quantitative Radar

Abstract:

Ultrasound and radar signals are beneficial for medical imaging due to their non-invasive and low-cost nature. Quantitative medical imaging can display various physical properties of the scanned medium, in contrast to traditional imaging techniques. This broadens the scope of medical applications including fast stroke imaging. However, current quantitative imaging techniques are time-consuming and tend to converge to local minima. We propose a neural network based on the physical model of wave propagation, to achieve real time multiple quantitative imaging for complex and realistic scenarios,using data from only eight elements, demonstrated for diverse transmission setups using either radar or ultrasound signals.

Maya Gilad

PhD candidate Technion

Bio:

Maya is a PhD candidate at the Technion. Working under the supervision of Dr. Moti Freiman, she specializes in the development of innovative algorithms for medical imaging. Maya holds both a BSc and an MSc in Computer Science, having graduated Magna Cum Laude. With a diverse background in software engineering and machine learning, she has had the opportunity to lead engineering teams in both the IDF and the private tech sector. Before assuming her current role at Voyantis, Maya served as an Algorithms Architect at Gett. Her current research efforts focus on leveraging DWI-MRI to improve breast cancer treatment outcomes.

Title:

Integrating Radiomics and Physiological Decomposition of DWI

Abstract:

We introduce PD-DWI, a machine-learning model for early prediction of pathological complete response (pCR) in breast cancer patients undergoing neoadjuvant chemotherapy (NAC). Leveraging decomposed diffusion-weighted MRI (DWI) and clinical data, our model outperforms conventional methods in the BMMR2 challenge, achieving an area under the curve (AUC) of 0.8849 versus 0.8397. PD-DWI has the potential to enhance pCR prediction accuracy, reduce MRI acquisition times, and eliminate the need for contrast agents.