Xiaodan Hu (胡晓丹)

I'm a Ph.D. candidate in the Computer Vision and Robotics Laboratory at the University of Illinois at Urbana-Champaign, working with Prof. Narendra Ahuja, specializing in computer vision and deep learning.

xiaodan9377@gmail.com     Google Scholar     Resume

Research interest

    My current research interests are in multi-modal deep learning and computer vision, specifically in applying deep networks to learn features over multiple modalities (e.g., text, images and audio) and modeling the relationship among the different modalities. Some possible application areas include generating poetry from images, generating high-fidelity images from text descriptions, and generating dance videos by inputting actions.


Education

  • Ph.D. Candidate in Computer Vision and Robotics Laboratory at University of Illinois at Urbana-Champaign (UIUC)
    (2019 September - present), Supervisor: Narendra Ahuja
    Courses: CS547 Deep Learning, ECE544 Pattern Recognition, CS543 Computer Vision
  • Master of Applied Science in the Vision and Image Processing Lab at University of Waterloo (UW)
    GPA: 90/100 (A+) (2017 Spring - 2019 April), Supervisor: Paul Fieguth
    Courses: SYDE780 Graphical Deep Learning, CS685 Machine Learning: Statistical and Computational Foundations, SYDE522 Machine Intelligence, SYDE675 Pattern Recognition, ECE613 Image Processing and Visual Communication, SYDE672 Statistical Image Processing
  • Master of Science in Tandon School of Engineering, New York University (NYU)
    GPA: 3.53/4 (2015 September - 2017 June)
    Courses: CS-GY6313 Information Visualization, CS-GY6643 Computer Vision and Scene Analysis, CS-GY6233 Introduction to Operating Systems, CS-GY6133 Computer Architecture I
  • Bachelor of Engineering in Information and Communication Engineering School, Beijing University of Posts and Telecommunications (BUPT)
    Major GPA: 87/100 (A) (2011 September - 2015 July)
    Courses: C++ Programming Fundamentals, Data Structures, Multimedia Communications, Java Programming

Papers

  • Squeeze-and-Attention Networks for Semantic Segmentation
    Zilong Zhong, Zhong Qiu Lin, Rene Bidart, Xiaodan Hu, Ibrahim Ben Daya, Zhifeng Li, Wei-Shi Zheng, Jonathan Li, Alexander Wong
    Accepted by CVPR 2020
    PDF
  • MUSE: Illustrating Textual Attributes by Portrait Generation
    Xiaodan Hu, Pengfei Yu, Kevin Knight, Heng Ji, Bo Li, Honghui Shi
    arXiv 2020
    PDF
  • RUNet: A Robust UNet Architecture for Image Super-Resolution
    Xiaodan Hu, Mohamed A. Naiel, Alexander Wong, Mark Lamm and Paul Fieguth
    Accepted as oral presentation at Women in Computer Vision Workshop at CVPR 2019 (CVPRW-WiCV 2019)
    PDF
  • ClearGAN: Photo-Realistic High-Resolution Text-to-Image Synthesis via Joint Inter-modal and Intra-modal Attention Modeling
    Xiaodan Hu, Paul Fieguth, Mohamed A. Naiel and Alexander Wong
    CVPR 2019 Workshop Language and Vision, accepted as poster and spotlight (CVPRW-Language & Vision 2019)
  • ProstateGAN: Mitigating Data Bias via Prostate Diffusion Imaging Synthesis with Generative Adversarial Networks
    Xiaodan Hu, Audrey Chung, Alexander Wong, and Paul Fieguth
    Accepted as a poster presentation in Neurips 2018 Workshop Machine Learning for Health (NIPSW-ML4H2018)
    PDF
  • Content-Adaptive Non-Stationary Projector Resolution Enhancement
    Xiaodan Hu
    Master Thesis, 04/2018-04-2019
  • Robust Visual Enhancement of Moving Contents in Projected Imagery
    Xiaodan Hu, Mohamed A. Naiel, Zohreh Azimifar, Mark Lamm and Paul Fieguth
    Accepted as a poster presentation at 2019 Society for Information Display International Symposium, Seminar and Exhibition (SID2019)
    PDF
  • Device, system and method for enhancing one or more of high contrast regions and text regions in projected images
    Xiaodan Hu, Mohamed A. Naiel, Zohreh Azimifar, Ibrahim Ben Daya, Mark Lamm and Paul Fieguth
    U. S. Patent
    PDF
  • Projector Resolution Enhancement Using a Non-stationary Content-adaptive Scheme
    Xiaodan Hu, Mohamed A. Naiel, Zohreh Azimifar, Ibrahim Ben Daya, Mark Lamm and Paul Fieguth
    Journal of Signal Processing: Image Communication, in review (JSPIC)
  • Text Enhancement in Projected Imagery
    Xiaodan Hu, Mohamed A. Naiel, Zohreh Azimifar, Ibrahim Ben Daya, Mark Lamm and Paul Fieguth
    Accepted as a poster presentation at the Conference on Vision and Imaging Systems (CVIS2018), published in a special issue of Journal of Computational Vision and Imaging Systems (JCVIS)
    PDF
  • Motion Detection in High Resolution Enhancement
    Xiaodan Hu, Avery Ma, Ahmed Gawish, Mark Lamm, Paul Fieguth
    Accepted as a poster presentation at the Conference on Vision and Imaging Systems (CVIS2017), published in a special issue of Journal of Computational Vision and Imaging Systems (JCVIS)
    PDF
  • Application of Modular Approach in GIS-based Hydrological Modeling
    Shixiong Hu, He Jin, Xiaodan Hu, Yuannan Long
    Accepted by Geoinformatics, 2014 22nd International Conference (Geoinformatics 2014)
    PDF

Awards

  • Annual Conference on Vision and Intelligent Systems 2020, Session Chair
  • Annual Conference on Vision and Intelligent Systems 2019-2020, Technical Program Committee
  • New In ML at NeurIPS 2020, Reviewer
  • ISCAS 2020, Reviewer
  • Received a travel award to attend and present the work at WiCV at CVPR 2019
  • Received a student travel grant to attend and present the work at SID Display Week 2019
  • Received the Provost's Doctoral Entrance Award for Women, UW 2019
  • Certificate of Completion of the Fundamentals of University Teaching Program in UW, 2018
  • Graduate Research Studentship (GRS), UW 2017-2019
  • International Masters Student Award, UW 2017-2019
  • Faculty of Engineering Graduate Scholarship, UW 2018

Research Experience


Teaching Experience

  • Teaching Assistant, BME 393 Digital Systems, University of Waterloo (2019 January - 2019 April). Instructor: Prof. Parsin Haji Reza
    Responsible for the overall management. Organize community service & cooperate with local NPOs.

Employment Experience

  • Research Associate at Lab of Vision and Image Processing (VIP), UW (2019 May - 2019 Aug). Supervisor: Paul Fieguth
  • Research Intern at Christie Digital Systems Canada Inc. (2017 March - 2018 August). Mentor: Mr. Mark Lamm
    Accomplish the content-adaptive high-resolution enhancement using a low resolution projector; Text detection; Motion detection
  • Software Engineer at SnagTag Inc. (2016 May - 2016 August). Mentor: Mr. Jake Elliott
    Developed an app to integrate traditional clothing labels and present virtualized information for customers when they pick up the project, which is triggered by the NFC tag stick to the product

Projects

  • ProstateGAN: Mitigating Data Bias via Prostate Diffusion Imaging Synthesis with Generative Adversarial Networks
    Estimate the potential distribution of prostate imaging samples, and use conditional GAN techniques to augment the prostate imaging datasets based on the corresponding labels to increase the accuracy of prostate cancer classification.
  • ClearGAN: Fine-Grained Text to High Resolution Image Generation
    Improve perceptual quality of generated image by considering both contextual loss and perceptual loss, and increase the resolution of synthesis by applying deconvolution network and sub-pixel convolution layer.
  • Text Enhancement in Projected Imagery
    Improve the visual quality of projected imagery by enhance text and non-text regions differently. Propose a text enhancement scheme based on a novel local dynamic range statistical thresholding
  • Motion estimation for High Resolution Enhancement
    Deep learning: Improve spatial pyramid network (SPyNet) based on the idea of temporal convolutional network (TCN) for motion estimation of high resolution videos; Train TCN on image datasets to generate motion flow for videos Classical: Propose Kalman-filter based optical flow motion estimation methods to gain accurate flow fields for videos; Design directional blurring filters for anti-artifacts; Video scene cut detection
  • Weight Quantization on Accuracy in Pre-Trained Mobilenets of Various Depth
    Evaluate the trade-off between accuracy and model size using pre-trained Mobilenet networks of different hyperparameters for classifying traffic signs. Use quantized pre-trained Mobilenets (the last fully connected layer removed) to extract features and trained our own 32-bit and quantized classifiers. Cross compared the the changes in accuracy relative to the model size.
  • Digital Pathology Image Classification
  • Feature Fusion for Different Face Recognition

Personal

  • I love travelling and foods. I've been to most places in China, the east coast and west coast in United States, and eastern Canada. Here are some photography I took during my travelling.
  • I have a lasting passion for piano and have been playing piano since I was 5 years old. I had participataed a few of local piano concerts and had passed the piano 10th-level grading test in China. Besides, when I was in college, I joined the choir as a soprano. I would love to share with you my favorate pieces I played when working from home: Fantaisie-ImpromptuMinute Waltz千と千尋
  • I love playing badminton, table tennis, volleyball and swimming, and I have received special training, which brings me a strong body and more fun while playing. Besides, I enjoy outdoor sports and extreme sports. I often climb mountains in long weekends, drifting in summer, and skiing in winter. Since few years ago, I have been thinking to get a certificate of diving and a recreational pilot permit.
  • I enjoy be involved in volunteer activities and I also devoted myself to the volunteer association as the president during my undergraduate study. Until now I still keep connecting with a school for the deaf and the dumb in Beijing.