The PIXL lunch meets every Monday during the semester at noon in
room 402 of the Computer Science building. To get on the mailing
list to receive announcements, sign up for the "pixl-talks" list at
No talks scheduled yet.
Monday, September 14, 2020
Computational Imaging with Light Waves
Modern imaging systems have evolved to effectively capture light. Through continuous developments, our ability to capture the real world now rivals that of the human visual system. However, capturing light in full fidelity is still challenging and one fundamental reason originates from the fact that light has the duality as particles and waves. Most imaging devices are not optimized to capture light as waves, therefore, losing valuable information of the real world hidden in those wave properties. This talk includes several studies of capturing, analyzing, and exploiting overlooked dimensions of light waves, namely spectrum and polarization, in order to solve problems in computer graphics and vision. We develop computational imaging systems with the core principle of jointly designing optics and algorithms to decode the delicate interaction of light waves and materials. We demonstrate three specific applications: RGB-D imaging from polarimetric double refraction, spectrum from dispersion, and appearance from polarization.
Monday, September 21, 2020
Deformable Style Transfer
Both geometry and texture are fundamental aspects of visual style. Existing style transfer methods, however, primarily focus on texture, almost entirely ignoring geometry. We propose deformable style transfer (DST), an optimization-based approach that jointly stylizes the texture and geometry of a content image to better match a style image. Unlike previous geometry-aware stylization methods, our approach is neither restricted to a particular domain (such as human faces), nor does it require training sets of matching style/content pairs. We demonstrate our method on a diverse set of content and style images including portraits, animals, objects, scenes, and paintings. More information can be found at sunniesuhyoung.github.io/DST-page.
Monday, September 28, 2020
Bad2Clear: Domain Transfer for Adverse Weather
Adverse weather consisting of snow, rain, or fog detrimentally affects both human and computer vision with serious potential consequences for safety critical tasks in outdoor environments. Todays computer vision algorithms and models assume clear weather inputs and thus fail when the input signal is perturbed by harsh weather conditions. These failures also exist when using lidar sensors whose SNR rapidly decays in dense fog. To address this long-standing challenge, we present a domain transfer method for converting RGB captures taken in adverse weather into clear daytime scenes. Our network training is supervised using a novel combination of synthetic and multi-modal real data. To facilitate training on real adverse weather scenes, we propose to employ gated imaging for supervision, which provides high contrast signal in spite of weather conditions. We demonstrate significant improvements on real adverse weather scenes over state-of-the-art weather correction approaches.
Monday, October 05, 2020
Socially Situated Artificial Intelligence
Despite their prevalence, artificial intelligence agents have consistently, since the 70s, been criticized as being brittle; i.e. they are unable to operate outside of their training examples. In this talk, I will argue that existing training paradigms, which learn from curated datasets, scraped web data and simulated environments, are responsible for these criticisms. Drawing on numerous social science fields, I will present a roadmap to socially situated agents --- agents that become less brittle by learning from interactions with people. However, to learn from interactions, we need to develop models that can process information into useful representations and communicate what they already know and what they would like to learn.
First, I will introduce scene graphs, a symbolic representation inspired by Cognitive Science experiments on human perception. With scene graphs, I will develop compositional, few-shot vision models that can interpolate to novel input compositions. I will also demonstrate how to leverage Psychophysics to collect large scale scene graphs to produce datasets like Visual Genome. Second, drawing on Linguistics and Information Theory, I will discuss mechanisms to describe visual inputs and ask questions: while traditional models generate generic questions when trying to learn new concepts, we can formulate question generation as an information maximization optimization that can generate richer, more diverse questions. Finally, my roadmap will culminate with a framework for developing socially situated agents that can simultaneously learn how to interact with people while learning from those interactions. I will showcase how we instantiated such an agent and showcase how it learned to become less brittle over the course of 8 months by interacting with over 230K people on social media.
Ranjay Krishna is a 5th-year Ph.D. candidate at Stanford University, where he is co-advised by Professor Fei-Fei Li and Professor Michael Bernstein. His work lies at the intersection of computer vision, machine learning, and human-computer interaction. His research aims to design AI agents through human practices, goals, and capabilities. He is a teaching fellow at Stanford, where he teaches two classes: one undergraduate course on computer vision and a graduate course on convolutional neural networks. He has a Masters of Science in Artificial Intelligence from Stanford University where he received the Christofer Stephenson Memorial award for his thesis. Before that, he conferred a Double Bachelors of Science Degree in Electrical and Computer Engineering and a second degree in Computer Science from Cornell University. He has interned at Google AI, Facebook AI Research, Microsoft, and Yahoo Research.
Monday, October 12, 2020
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
The ability to perform effective planning is crucial for building an instruction-following agent. When navigating through a new environment, an agent is challenged with (1) connecting the natural language instructions with its progressively growing knowledge of the world; and (2) performing long-range planning and decision making in the form of effective exploration and error correction. Current methods are still limited on both fronts despite extensive efforts. In this paper, we introduce the Evolving Graphical Planner (EGP), a model that performs global planning for navigation based on raw sensory input. The model dynamically constructs a graphical representation, generalizes the action space to allow for more flexible decision making, and performs efficient planning on a proxy graph representation. We evaluate our model on a challenging Vision-and-Language Navigation (VLN) task with photorealistic images and achieve superior performance compared to previous navigation architectures. For instance, we achieve a 53% success rate on the test split of the Room-to-Room navigation task through pure imitation learning, outperforming previous navigation architectures by up to 5%.
Monday, October 19, 2020
Recurrent All-Pairs Field Transforms for Optical flow and 3D Reconstruction
Many tasks in computer vision can be framed as correspondence problems subject to task specific geometric constraints. Some examples include optical flow, rectified stereo, multi-view stereo, SLAM, and Scene Flow. Optical flow, or dense correspondence, is the task of estimating per-pixel motion between a pair of frames. Despite a long history of progress, the best systems are still limited by difficulties including fast-moving objects, occlusions, blur, and textureless surfaces. In this talk, I will discuss our work RAFT, which is a new network architecture for optical flow. RAFT extracts per-pixel features, builds multi-scale 4D correlation volumes for all pairs of pixels, and iteratively updates a flow field through a recurrent unit that performs lookups on the correlation volumes. I will also discuss our more recent efforts to apply RAFT to a general set of 3D reconstruction problems, including rectified stereo, multi-view stereo, Sim(3) registration, and SLAM.
Monday, October 26, 2020
Monday, November 02, 2020
REVISE: A Tool for Measuring and Mitigating Biases in Visual Datasets
Machine learning models are known to perpetuate and even amplify the biases present in the data. However, these data biases frequently do not become apparent until after the models are deployed. To tackle this issue and to enable the preemptive analysis of large-scale datasets, we present our tool. REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset, surfacing potential biases currently along three dimensions: (1) object-based, (2) gender-based, and (3) geography-based. Object-based biases relate to size, context, or diversity of object representation. Gender-based metrics aim to reveal the stereotypical portrayal of people of different genders. Geography-based analyses consider the representation of different geographic locations. REVISE sheds light on the dataset along these dimensions; the responsibility then lies with the user to consider the cultural and historical context, and to determine which of the revealed biases may be problematic. The tool then further assists the user by suggesting actionable steps that may be taken to mitigate the revealed biases. Overall, the key aim of our work is to tackle the machine learning bias problem early in the pipeline.
Monday, November 09, 2020
Monday, November 23, 2020