Research LCDM@UIUC

Star-galaxy Classification Using Deep Convolutional Neural Networks

Most existing star-galaxy classifiers use the reduced summary information from catalogs, requiring careful feature extraction and selection. The latest advances in machine learning that use deep convolutional neural networks allow a machine to automatically learn the features directly from data, minimizing the need for input from human experts. We present a star-galaxy classification framework that uses deep convolutional neural networks (ConvNets) directly on the reduced, calibrated pixel values. Using data from the Sloan Digital Sky Survey (SDSS) and the Canada-France-Hawaii Telescope Lensing Survey (CFHTLenS), we demonstrate that ConvNets are able to produce accurate and well-calibrated probabilistic classifications that are competitive with conventional machine learning techniques. Future advances in deep learning may bring more success with current and forthcoming photometric surveys, such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope (LSST), because deep neural networks require very little, manual feature engineering.


Deep Convolutional generative adversial neural network(DCGAN) on Images

Nowadays, people can generate small size image(128x128) with high quality, but still hard to generate bigger size image. By applying deep convolutional generative adversial neural network(DCGAN) and Variational auto-encoder(VAE) on SDSS image(2048x1024) to generate the fake images with high quality. DCGAN consists of two parts, one part is generator and the other is discriminator, trainning both of them at the same time and generator could learn little faster than discriminator. The whole model can be treated as game theory, where generator outputs the fake image and sent it to discriminator and discriminator should distinguish the image is real or fake. From this procedure, each layer learned to generate lower-level representations conditioned on higher-level representations.

Stock Return Prediction

Predicting stock returns for particular companies based on the release of SEC Forms. I'm comparing various models for text data such as tf-idf bag-of-words and Doc2Vec as well as different ML classifiers like logistic regression, random forest, and SVM. The project involves processing somewhat large data sets (100GB+) as well as extracting useful sentiment information out of very noisy/low information text data.

Time Series Analysis on Stock

Applying stochastic variational inference (SVI) to fit the parameters of a Hidden Markov Model for time series stock data of a single stock. SVI scales well to large data sets because it operates on mini-batches of data rather than entire data sets in each iteration of the algorithm. Hidden Markov Models are used for their properties regarding what we can and cannot see going on under the hood, much like with real stock data. The hope is that after fitting the parameters of the model it will be able to accurately model the data.

Star-galaxy Classification Using Semi-Supervised Generative Adversarial Networks

Probing the application of generative adversarial newworks(GANs) to the star-galaxy classification problem in a semi-supervised setting. Utilizing data from the Sloan Digital Sky Survey(SDSS), demonstrating that semi-supervised GANs are able to produce accuarate and well-calibrated classifications using only a small amount of labeled examples.


A simple and fast method for computing the Poisson binomial distribution function

It is shown that the Poisson binomial distribution function can be efficiently calculated using simple convolution based methods. The Poisson binomial distribution describes how the sum of independent but not identically distributed Bernoulli random variables is distributed. Due to the intractability of the Poisson binomial distribution function, efficient methods for computing it have been of particular interest in past Statistical literature. First, it is demonstrated that simply and directly using the definition of the distribution function of a sum of random variables can calculate the Poisson binomial distribution function efficiently. A modified, tree structured Fourier transform convolution scheme is then presented, which provides even greater gains in efficiency. Both approaches are shown to outperform the current state of the art methods in terms of accuracy and speed. The methods are then evaluated on a real data image processing example in order to demonstrate the efficiency advantages of the proposed methods in practical cases. Finally, possible extensions for using convolution based methods to calculate other distribution functions are discussed.

Vizic: A Jupyter-based Interactive Visualization Tool for Astronomical Catalogs

The ever-growing datasets in observational astronomy have challenged scientists in many aspects, including an efficient and interactive data exploration and visualization. Many tools have been developed to confront this challenge. However, they usually focus on displaying the actual images or focus on visualizing patterns within catalogs in a predefined way. In this paper we introduce Vizic, a Python visualization library that builds the connection between images and catalogs through an interactive map of the sky region. Vizic visualizes catalog data over a custom background canvas using the shape, size and orientation of each object in the catalog. The displayed objects in the map are highly interactive and customizable comparing to those in the observation images. These objects can be filtered by or colored by their property values, such as redshift and magnitude. They also can be sub-selected using a lasso-like tool for further analysis using standard Python functions and everything is done from inside a Jupyter notebook.