Teaching
Universities exist because of the students. As a university professor I strive to be a good teacher.
What makes a good teacher? I believe that a good mathematical science teacher must:
1) set up a good example of loving mathematical science;
2) motivate students' learning and further stimulate their interest in the field; and
3) effectively deliver course material.
In my class, students can expect to see:
1) My excitement of doing mathematical science;
2) My constant encouragement of studying mathematical science;
3) My handwritten classnotes posted on Blackboard after each lecture.
I highly value the importance of homework assignments and believes that the amount and level
of difficulties of the homework problems directly affect the effectiveness of students' practice.
Students who plan to take my class should expect to do weekly homework assignments with a reasonable amount of problems including some challenging ones.
In spring 2020, I will teach Analysis of Deep Learning (MF 3:304:45 PM. LOCATION: TLC 247).
Deep Learning is a machine learning technique that is penetrating almost all scientific areas that involve big data.
In this course, through some basis deep neural network models,
I will explain the roles of essential components of deep learning.
With the knowledge learned from this course, the students will be able to implement the common neural networks,
customize and design neural networks for their specific applications, and analyze the performance of the models.
This course will be taught in Python and Pytorch.
Prior experience on Python is helpful, but not required. Linear algebra (Math 330)
background is needed to understand the analysis in most lectures. Two of the lectures require Calculus III (Math 275).
Two lectures require some background in probability and statistics (Stat 301).
Students who do not have any machine learning or Python background are advised to register for the Pass/Fail session (Math 40406). The topics are:
 Python and Pytorch basics for deep learning
 Machine learning basics on working with real data
 A minimal neural network with essential components, applied to MNIST dataset
 Distances; Cross Entropy Loss and its probabilistic interpretation
 Backpropagation and stochastic gradient descent method
 Choosing Activation functions; customizing loss function
 Neural network examples on time series data and voice data
 Convolutional Layers
 Some popular models on image classifications, applied to cifar 10 dataset
 Analyzing neural network performance and uncertainty quantification
 Batch normalization, whitening and techniques on preventing vanishing gradients
 Dimension reduction techniques and popular practices on avoiding overfitting
 Faster optimizers
 Generative Adversarial Networks (GANs)
Research
I am currently working on problems in two different areas:
1. Metric entropy and its applications in nonparametric estimation
Consider the following problem: How many fire stations are needed in the New York city so that every house can be reached by a fire truck within 1 minute
after receiving a fire alarm? The problem is complicated as the time a fire truck takes to travel from A to B depends on road condition,
trafic, weather, events and many other things. If we redefine the distance between A and B as the expected traveling time, the problem can be then formulated as follows:
Given a bounded (twodimensional) set in a metric spaces, how many balls of radius r in this metric space are needed to cover the set?
The latter is a typical metric entropy problem, except that in metric entropy, the set is usually high dimensional, or infinite dimensional
(such as a set of functions), which the distance is clearly defined.
Here is an open problem: How many probability distributions on the ddimensional unit cube should be sampled,
so that any unknown probability distributions on the ddimensional unit cube is within L(2)distance r from the ones you have sampled?
(For the best known results, please see Metric Entropy of HighDimensional Distributions.)
This is only one of the many examples of the open problems I am interested in. These problems have one thing in common,
that is, they have applications in nonparametric estimation in statistics. Indeed, metric entropy gauges the geometric complexity of these function spaces, while the geometric complexity determines the best rate of convergence of an statistical estimator can possibly achieve.
Motivated graduate students are welcome to this area of research.
2. Deep learning
Deep neural network is one of the current top choices for analyzing highdimensional data. Given a dataset,
what neural network architecture should we use to make best prediction? What depth?
how many neurons? Currently, all these are mainly determined by modelers' experience.
My group attempts to answer these questions using mathematical and statistical tools. We are working on three different but interrelated projects:
Approximation Theory of Neural Networks: Under various assumptions on the shapes of the level sets of the multivariate functions under consideration,
we study the upper bound for the network complexity.
Analytically architecting deep neural networks: Given a highdimensional dataset, We propose to use statistical analysis to quantify the structure complexity of the dataset, then use the quantative characterization of the data structure to design neural network architectures using approximation theory.
Representations of highdimensional data: Most current neural networks require the input data to be lowrank tensors. For example,
a multichannel image is input as a 3D tensor. In realworld applications,
effectively expressing the data in tensor forms could be challenging.
For example, there does not seem to have a ready to use effective way to express a
protein structure or a nonoporous MetalOrganic Framework (MOF) material in tensor form.
Also, in practice, expressing a categorical variable that takes many possible values
usign the commonly used onehot method could unnecessarily increase the neural network complexity.
In this area, Our group aims to find a unified approach for a tensor representation of the data
that keeos the essential structural complexity of the original data. As a specific application of this
approach, we are developing a tensor representation of MOF materials.
Currently, the group has four members: Frank Gao (PI), Boyu Zhang (postdoc), Daniel Furman (graduate) and Zachary Sugano (undergraduate), under an NSF grant support. We are looking for one more undergraduate research assistant.
