Probabilistic Programming
We have a bunch of observations, $\theta_n$
Is it raining at Tanah Merah?
Is it raining at Pasir Ris?
Is it raining at Changi Airport?
These observations are incomplete
We don't directly measure wind patterns, humidity, etc.
All of machine learning is estimating $P(X|\theta_{0..n})$
If we know it's raining in Pasir Ris and Changi, how likely is it to rain on campus in the next 5 minutes?
Tables vs. Graphs
Is it raining in Changi? | Is it raining in Tanah Merah? | Is it raining in Pasir Ris? | How often did it rain on campus 5 minutes later? |
Yes | Yes | Yes | 95% |
Yes | Yes | No | 94% |
Yes | No | Yes | 93% |
No | Yes | Yes | 72% |
No | Yes | No | 71% |
Yes | No | No | 90% |
No | No | Yes | 70% |
No | No | No | 0.2% |
|
|
Bayes' Theorem
$$P(X|\theta) = \frac{P(\theta|X)P(X)}{P(\theta)}$$
As we collect data, our guess for $P(X|\theta)$ gets more accurate
This is a hierarchical Bayesian model
You collect observations to "fit" the probability distribution of each node
The less flexibility in your model, the less data you need
Probabilistic Programming
- Our model is probability distribution, from which we can draw samples
- We can generalize beyond a DAG
- You've probably done this before...
Probabilistic Programming
- Generally, we make a probabilistic program with some free parameters
- We estimate the output distribution of the program
- Then we use an optimizer to tweak the free parameters so the distribution is likely to produce our data!
- Example: Regression
Why do we care?
- We can build complex models that are hard to translate into existing ML structures
- This lets us use less data and training time, by integrating our expert knowledge!
- Lots of existing research, libraries and languages for building distributions and estimating the output
How do you use it?
- Church LISP-y, made for cognitive science, designed to be easy to model "human" cognition, VERY alpha-state
- PyMC3 Library for python, handles DAGs easily and efficiently
- WebPPL Subset of javascript, probably the best for general usage
Neural Networks and Computational Graphs
We can view these techniques as computational graphs
Traditional neural networks
- Each layer has a weight matrix $W$ which scales and sums the outputs of the previous layer
- In probabilistic programming terms, these $W$s are the part we optimize
- Things we don't change during training are called hyperparameters
Each neuron has a non-linear response to the weighted sum
How does our neural network adapt to this problem?
Can our neural network deal with this?
How could we make it work?
Deep learning
Just has a lot of big layers ¯\_(ツ)_/¯
Convolutional Neural Networks
During training, we tweak convolutional kernels that capture key features of our 2-D data
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
Long Short Term Memory
Each neuron has a feedback loop, so it can remember previous examples
By BiObserver - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=43992484
Autoencoders
http://blog.fastforwardlabs.com/post/148842796218/introducing-variational-autoencoders-in-prose-and
How do we examine the hidden layer?
http://blog.fastforwardlabs.com/post/148842796218/introducing-variational-autoencoders-in-prose-and
We can watch it learn :)
http://blog.fastforwardlabs.com/post/148842796218/introducing-variational-autoencoders-in-prose-and
My Plan
Have lots of descriptions of "good" design software, and train the system to produce text with similar product features
Treat design principles as conditional probabilities over sets of features, and hope the latent space picks up on that