There’s an important new paper out by Larry Wasserman et al. that describes a very general technique, called Universal Inference, for constructing statistical hypothesis tests and confidence intervals. In the traditional theory of statistics, such as would be taught in an undergraduate mathematical statistics course, a standard way hypothesis tests are constructed and analyzed is… Read More Universal Inference: Review Part I
In the previous post, we demonstrated how to efficiently compute co-occurrences with matrix algebra and use those calculations to recommend books to users. Though we saw some sensible recommendations come out of this approach, it also suffers from a number of issues, including: The Gatsby Problem: popular books tend to be over represented in the… Read More Recommendation Systems: From Co-occurrence Counts to Probabilities
In the previous post of the series, we developed a co-occurence model for book recommendations. The model is similar to Amazon’s highly successful “Customers who bought…” feature. Now it’s time to apply this simple model to some real data to make recommendations. Preprocessing As usual, we’ll work with the Goodreads dataset. I described the structure… Read More Recommendation Systems: A Co-occurrence Recommender
I recently read Kritzmen and Li’s clever 2010 paper Skulls, Financial Turbulence, and Risk Management. Kritzmen and Li characterize financial turbulence as a period where established financial relationships uncouple, prices swing, and market predictions break down. Does that sound like financial markets in 2020? Yup. So I thought it would be interesting to take a… Read More Financial Turbulence: Off the Chart
In this first post in a series on recommendation systems, we’re going to develop a powerful but highly intuitive representation for user behavior that will allow us to easily make recommendations. Since we’re going to be making heavy use of the Goodreads data set in the series, we’ll formulate our basic recommendation system problem as… Read More Recommendation Systems: Co-occurrence Calculations
Today, I am announcing a series of posts I am developing about recommendation systems. The series is aimed at software/machine learning engineers. I have two goals for the series: Provide practical and implementable strategies for delivering recommendations in real-time Present the mathematical intuition behind recommender problems The reason for the first goal is that I… Read More A Practical Series on Recommendation Systems
If you are reading this, you probably already know that data pre-processing is the 90% perspiration of machine learning. You might love it or you might dread it, but you probably don’t think of it as a the part of ML where the most interesting mathematics lives. Let me challenge that view a bit with… Read More Can you norm rows and standardize columns at the same time?
On February 28, I presented at the University of Kentucky’s Mathematics Department Alumni Day. My talk contains practical advice for math students (graduate and undergraduate) to prepare for Machine Learning careers.
I’m working through Wasserman’s All of Nonparametric Statistics, a wonderful and concise tour of nonparametric techniques. What is nonparametric statistics? It is a collection of estimation techniques that make as few assumptions as possible about the distribution from which your data came. Let’s work through an example in R that’s mentioned in Chapter 3 of… Read More A Jackknife Example
In the past, I wrote frequently about quadratic programming especially in R, for example here and here. It’s been a while and at least one great new library has emerged since my last post on quadratic programming — OSQP. OSQP introduces a new technique called operator splitting which offers significant performance improvements over standard interior… Read More Sparse quadratic programming with osqp