Toward a New Great Compromise: How I would change the Constitution

I recently completed a project where I decided to rewrite the US constitution. The changes were designed to fix several problems in the design of the senate and the administrative state. When I started thinking about how the design of the Senate is flawed, I found out that George Mason was saying some of the very same things back in the constitutional convention of 1787!

I’ve posted it in several places for public comment. I believe that the new executive council is designed to fill executive offices better than the current role of the Senate provides, and will provide a more harmonious and prosperous future. I’d like to know what you think!

Read Toward a New Great Compromise: A Constitutional Plan to Promote Liberty, Renew Democracy, and Restore Federalism

If you want to hear me talk about this more, I explain it in the following podcast episode:
Rewriting the Constitution: Did the Founders Screw Up the Senate?

I talked about an update I made to the Veto Powers a few weeks later.

And if you want to see my first episodes on the subject, check out the 3-parter from this past summer (2023):
Max Changes the Constitution: Part 1
Max Changes the Constitution: Part 2
Max Changes the Constitution: Part 3

Revisiting the Dirichlet-Multinomial after 10 years

I recently shared the following to my research list:
(See the latest research at localmaxradio.com/labs)

It’s been over a decade since I started building what is now the Foursquare City Guide venue ratings. This is a system applied to restaurants, cafes, parks and the like to find out what’s good. One question that came up is “how many ratings do we need before we have enough information to know the quality of a place?”

In the pursuit of an answer, I made use of the Dirichlet-multinomial probability distribution. The idea was that given a bunch of count data, I can come up with a prior probability that would apply to each row (or venue) before seeing the ratings. This tells us how much we should let the data inform our beliefs.

In Bayesian terms, this is an “informative prior”.

I found this simple model to be useful in many other situations. A good example is calculating the seasonality of different food items presented at RecSys in 2014.

Because time is expensive, I wanted a method to calculate these priors quickly. So, I found a way to efficiently compress the data, thereby simplifying the computation. I ended up implementing this in a blazing fast Python script.

To make these ideas available quickly, I shared it on arxiv and gave a talk about it at the New York Machine Learning meetup. Almost ten years later I still get contacted regarding this work, so I decided to make some updates to the code and the paper, addressing a few mistakes and making the language more precise. The remaster! 

I also created an addendum detailing the transformation of equations for Newton’s Method. There, I describe how the Hessian Matrix can be inverted efficiently in this particular situation. An interesting twist to this algorithm is doing it in "log space" - which makes sense since Dirichlet parameters are always positive.

I love sharing insights about these subjects, and I hope you enjoyed this one. Have a great summer!

Practicing my Python on Multinomial Mixture Models

I just worked to revive a fun library in my good old Bayesian package, BayesPy, related to the multinomial mixture model in https://github.com/maxsklar/bayespy

Multinomial Mixture models are much simpler than LDA but can be run in lightning speed and can give you an idea of how or whether the data is clustered. It can be run on datasets of category-count data. I think of it as K-means on a probability simplex.

Just run:

python MultinomialMixture/writeSampleModel.py -A 0.3,0.3,0.3 -m 2,2 | python MultinomialMixture/writeSampleDataset.py -N 10000 -M 500 | python3 MultinomialMixture/inferMultinomialMixture.py -K 3 -C 2

It generates a random data set and attempts to figure out the multinomial mixture on the fly. Of course, inferMultinomialMixture.py can also be run on ANY applicable dataset.

There is I'm sure really short code on pymc3 that can accomplish this as well, but it's fun to revive the pure python approach every once in a while.

9 Tips for Participating in a Public Debate

So you recently agreed to do a formal debate, and you have settled on a resolution. If you are like me, you may be thinking “WHAT have I gotten myself into?”.

I recently participated in a public debate for the “Live Free and Debate” series in Rollinsford, New Hampshire. I argued against the resolution “Monarchy is better than Representative Democracy” for a full 90 minutes.

Participating in a debate is like no other form of presentation that I’ve ever experienced. It is unpredictable, it causes you to constantly rethink your position on the fly, and inspires a slight paranoia about how the audience is following your arguments. What a mind-spinner!

Afterwards, I brainstormed ways to prepare for one of these in the future in order to have a successful and enjoyable debate performance. Future me will be using this for reference, and I hope you will find this helpful as well.

1) Your primary mission is to entertain the audience.

This first point may be controversial, but remembering this will make everyone better off. You are there to get everyone on your side and crush your opponent. Right? Not so fast!

There may come a time where you need to disseminate information in a dry and dispassionate manner as part of some formal process, but this is not one of those times. Do not forget that audience members decided to give up their valuable leisure time in their busy lives to come out and see you debate. Do they hope to learn something new? Absolutely! Are they going to learn something new if they are bored to death? Not a chance!

Think of some ways to make your topic entertaining. Is it fun? Or is it mind-bending? Are there any compelling lines that can be made about it? A good place to start with entertainment and potentially humor is to look at the aspects of the debate that make people uncomfortable. Why do people disagree so vehemently on this topic?

I can’t give an exact answer on “how” to be entertaining and what would be appropriate. That all depends on the audience, the venue, and your own personal style. But I will say that if you keep the audience entertained - or even make a reasonable attempt at doing so - much else will be forgiven.

2) Know your audience

Knowing your audience is a pretty straightforward prerequisite for any presentation. Think about the venue that you are participating in. Where are they from and what media do they consume? What are likely to be their preconceived notions about the topic? What are their concerns with supporting one side versus another?

You may come up with several different types of people that you want to acknowledge and reach. It might be helpful to think or write out these “characters” and figure out what you would like to say to each one.

3) See what your opponent (and those on the same side as your opponent) have said about the issue in the past.

Suppose that your opponent comes up with a clever new argument that you have not considered before, and you have trouble coming up with a response on the fly. This is totally understandable. You may retreat to your other arguments, or you might think of something later in the debate.

But suppose alternatively that your opponent makes the classic argument in favor of their cause, or an argument that they personally have relied upon in the past. And you have nothing to say about it? How embarrassing!

You do not have time to get an entire PhD on the topic at hand, or read stacks of fully researched books and articles, critically examining all the details in each of them. But you DO have time to check out the main points. What are the typical arguments in favor of the other side? Be sure to have an answer to those.

If your opponent has said or written publicly on a topic in the past, then this is good news for you because you can go back and read it and be sure to prepare a response. If your opponent hasn’t said anything about it - then this is also good news for you because their notions haven’t been critically examined yet (at least publicly). Either way you win!

4) Gather arguments from diverse sources

Everyone thinks a little bit differently, and everyone responds to different types of arguments. A lot of people are going to come to you before the debate and claim that they have the “killer argument” that’s going to wipe your opponent away.

Don’t fall for it! Take that argument in and use it perhaps, but consider the fact that no one person has their pulse on the entire population, or even the entirety of the specific audience. Remember your audience profiles from part 2 - talk to several people who span these profiles. Talk to some people who are not in the audience and get their views.

I am reminded of the game “family feud” and how difficult it can be to guess how 100 people will respond to even a very simple question. This is why getting a cross-sectional response from a diverse panel is going to make you more prepared for anything that comes your way.

5) Edit, Edit, Edit!

When I created consumer products as a software engineer, one of the key but often neglected components is “editing”. And when editing software, it means deleting things that you’ve written - and editing products means removing features that you’ve built!

Editing is often the difference between a successful project and an unsuccessful one. Without editing, you get feature bloat, mission creep, and technical debt. All of these fancy terms point to a failure to edit!

You will develop several different arguments in favor of your cause, and you may grow attached to some of them. But it’s important to be honest about which arguments are the strongest and the simplest. These are the ones that will most likely reach your audience, and a “throwing spaghetti against the wall until it sticks” approach is not as strong.

Some of your arguments and lines should appear on the “cutting room floor”. If you feel that you are being forced to cut something that you like for time and efficiency, this is an indication that you are doing a good job.

6) When answering audience questions, refer to ideas that you’ve already established.

Opening up the audience for questions is very dangerous. There will be people getting up on their soapboxes to give their pet theories, and there will be a lot of thinking on the fly. But there will also be a lot of insightful and tough questions.

This could be disconcerting, and I feared falling into two traps. The first is sounding like a broken record and repeating the same arguments over and over. “Why isn’t this getting through to people!?” The second is to try to make up some clever argument on the fly in order to avoid the first problem.

The reality is that most audience questions will have to do with something you’ve addressed, but you need to make a bridge between what you said and their specific objection or case. Instead of thinking “I said this earlier - what’s the problem”, think “let's take what I said earlier - and now here is how it applies to your specific question”.

This puts you in a better relationship with the audience member. Don’t forget that in order for people to properly grasp concepts, they need to see it applied to a few specific instances to get the big picture. That’s primarily what you’ll be doing during the Q&A.

7) Pick your battles.

You don’t need to respond to everything your opponent says, or everything an audience member says. If you’ve already stated an argument against it, or if it is a bit out of left field - it might be something the audience will discard as well.

With apologies for relying again on the “spaghetti against the wall” analogy - let some of the incoming spaghetti fall to the ground on its own.

8) In your conclusion, deploy the best stories to illustrate your main arguments.

The conclusion to the debate should not include new arguments - it should remind the audience of the main thrust of your arguments. You might make adjustments to this as the evening wears on, particularly in identifying the strongest examples for your arguments. But your main goal here is to summarize and to bring it all together.

It would be a mistake to make the conclusion an afterthought. In fact, it might even be helpful during your prep to think about how you envision your big ending - your final nail in the coffin - and then work backwards to your opening argument.

Don’t forget to entertain your audience and keep them engaged! Use your best stories and your best illustrative examples to tie together the arguments that you’ve already made. This will leave a big impression, and in many cases will decide the outcome of the debate.

9) Chill Out and remember good sportsmanship

Your opponent, like you, is there to take the extraordinary risk of presenting ideas and holding it up to public scrutiny. You will lay out ideas against each other, and win or lose, both sides will learn and improve (and therefore win). Keep in mind that even in a one-sided result, it is never the end of the discussion. Both sides can go back, reexamine their arguments, and continue.

You should have a lot of respect for your opponent, and never underestimate them. They put themselves in a stressful and socially-risky situation just like you did.

These are my 9 initial takeaways from the debate. What do you think? Anything that I missed?

RESEARCH: Bias Correction for Supervised Machine Learning

I am thrilled to share that after a ton of work I recently published a research paper on Arxiv called Sampling Bias Correction for Supervised Machine Learning. A local link is available as well as a Local Maximum podcast episode (218).

 
 

The paper says that if you receive only a portion of a dataset for machine learning, and you know the mechanisms by which the original data was abridged, you can work out the formulas for learning on the original dataset.

The lost data will make your results less certain, but at least the bias can be counteracted in a principled way.

I originally wrote this because of a problem that I worked out when I was building the attribution product for Foursquare in 2017. We were using machine learning to predict the likelihood of people visiting places. Because the “non-visit” examples were humongous, we sampled it at a larger rate than the “visit” examples. The mathematics of sampling bias was researched to counteract this.

I encourage anyone who is interested in machine learning and computer science in general to check out the introduction. It’s a quick read - and presents the high level ideas in our field.

Section 3 is really just “machine learning 101” which I started writing in order to establish the vocabulary, and eventually decided to make it a free-standing walk-through of the machine learning process. If you want an introduction to machine learning from a Bayesian perspective, read this section! It serves as a high-level primer for people who are unfamiliar with machine learning but have some mathematical background.

Supervised machine learning is not the only variety - and it is not always presented as a Bayesian inference problem - but it is an incredible tool for anyone trying to learn how this all works.

If you want to gain expertise in the bias correction problem, and want a deep understanding of how the Bayesian formulation solves this, then you should read the whole thing! Of course, if you need to solve the bias correction problem for work and the deadline is coming up, feel free to skip to the answers in section 5 (and 6 for logistic regression)!

I hope that this research demonstrates that formulating a machine learning problem in the language of Bayesian Inference helps to break down and answer really tough questions. This solution would not have been possible without thinking in terms of Bayesian distributions - and I think this paper will serve as an excellent case study for people to understand why that is. Once you stop thinking in terms of exact answers and start thinking in terms of beliefs over possible answers, a whole new world of insights opens up.

Finally, I want this to be used in practice.

Much noise has been made about bias in datasets. The solution I present here will allow practitioners to make assumptions about this bias and peek at the consequences of those assumptions which is a useful first step.

But more immediate is the original motivation of the paper, which is to reduce the size and composition of training sets so that computing becomes more efficient. The thinking behind this progresses as follows:

1) Big Data Product: Great news - we have a ton of data to run this model on! We can build something useful here.

2) Big Data Drawback: Hey, using all this data takes up a ton of resources. Are we past the point of diminishing returns? How about we throw out some data - pocket the savings - and the result will be just as good, or like 95% as good which is totally fine for us.

3) Uniform Random Sampling: How many examples do we have - 100 million? I think this will work on only 1 million, so for each datapoint I am going to pick a random number so that it keeps it 1% of the time. Then we'll end up with around a million points.

4) Bias Sampling: But wait!! Some data points are more valuable than others because we have a label imbalance. So let's be more selective about what we throw away. This means we can either safely throw away more data, or we can get better performance from the 1% rate we did before.

Now that bias sampling can be dealt with appropriately, this method can be deployed routinely and hopefully bring about compounding savings.

Work still needs to be done in terms of solving for specific cases (as this paper does for logistic regression) and accounting for different sampling types. Let me know if you’re interested in any of these questions - and I’ll give you my assessment!

Finally, this exercise has made me come to understand that the teaching of Bayesian Inference, while fairly standardized, is still ripe for innovation from educators and systematizers.

For example, the mathematics of Bayes rule benefits from considering probability to be proportional rather than absolute. This allows us to safely remove all of these pesky constant factors which are ultimately unnecessary and confuse everyone trying to follow the math. I tried to rely more on proportionalities and the idea of probability ratios rather than raw probabilities - but I’d like to see better notation around it. For example, in proportionality statements it should be obvious which symbols are considered variable and which are considered constant.
I hope to continue this line of research and incorporate these tools into my future projects, including newmap.ai which is currently in development.

All the Lasts

I’m optimistic that we’ll reopen, and get back to normal (with some changes in our routines and habits) at some point soon, and all of this quarantine and lockdown will be relegated to a historical curiosity that occurred at some point in our lives. But now that I have a few minutes - and maybe I’m a little bored, or a little bit wanting to procrastinate - I’ve compiled a list of “last times” for things that I won’t be able to do until this virus runs its course in the world.

Fortunately, I’m still an avid Foursquare Swarm user (as well as an employee) so I can just go to foursquare.com/history (or the single player tab in the app itself) and get all the information I want. Here it is!

Last time shopping at a “big” store: Target, March 11th. It’s been delivery and a few convenience store stops since then,

Last time at a coffee shop to sit down: Coffee Project NY, March 10th - one of the most highly rated (and also expensive) coffee shops in Brooklyn. Very enjoyable Avocado Toast. Little did I know, that was it - it’s early quarantine time! Maybe if the city shut down with us on that date, we’d be in better shape to reopen now.

Last time eating at a lunch spot: Taco Suprema, March 9th, Fort Greene Brooklyn

Last time leaving Brooklyn: March 8th, Queens

Last time at a Brooklyn Bagel Place - Brownstone Bagel and Bread Co, March 7th - nice bagels but the way I remember the place it’s pretty much impossible to do social distancing in there. This is the last day I was in a small crowd.

There was another crowd I was in that day: The Apple Store. I got the latest iPhone on March 7th as well. Feel like a different era!

Last time in Manhattan: March 6th - coming home from work after finding out they’d shut down.

Last time outside New York state: March 2nd leaving my parents in Connecticut. The toilets were out, so I had to drive to Dunkin Donuts in the area that day (which is very clean compared to the ones in Brooklny). I thought that was my major inconvenience for the year! Little did I know.

Last time eating at an Actual restaurant: Red Rose Inn, Springfield MA (March 1st on my way back from skiing).

Last time at a bar: General Stark’s Pub at Mad River Glen, Feb 29th.

Last time at a Gym that isn’t the space in front of my couch: Gramercy Place Health Club, February 24th

Last time at an in-person party: Cousin Rachel on Feb 23rd at St George’s Tavern Downtown

Last time at a sit down restaurant with other people: Parm downtown on February 18th (though I might also count Tacobi the next day, but that was a quick stop)

Last time in a movie theater: December 21, 2019. The Rise of Skywalker!

My Message To Zipcar Customer Support

Note: I’m posting this for entertainment value only, and not to get anything more than Zipcar. Despite this experience, it’s worked for me in the past - so I hope I can get it to work again. But I think this is pretty funny - and it won’t do any more damage to Zipcar than the stuff that’s already out there - so I’ll post it.

Another Note: I should post more about the better experiences in life.. but this was already written

The Message
I originally reserved the car from my apartment building garage at XYXYXYXYXYX, Brooklyn for a fun night out from 6pm to midnight.

I unexpectedly got an email saying that the reservation was moved a few blocks away to YZYZYZYZYZY lot. No Problem!

I got there at 6, and the attendants said that the car isn't there.

First call to Zipcar Customer Service: Confirmed that the previous renter is late, decided to wait. We'll give you a small credit an extra time.

Got a coffee... 6:30. Car still isn't there.

Second call to Zipcar: Ok, we'll give you a reservation in Park Slope. Oh wait that guy is late too! Ok on Henry Street.

So, we take a 15 minute uber ride to henry street. Zipcar app undoes my discount! Ok call #3 - they say the discount is coming.

Get there at 7. The car is not there! Call Uber [sic.. meant Zipcar] again.. sorry the car isn't at this location - would you like another Brooklyn location - or I can try to locate the car.

Car located! It's 5 blocks away, but you have to cross under a busy highway in an industrial zone. Run across that street!

Ok, takes a few minutes but we honk and honk and finally find the car at 7:30.

Ended up getting back at 1:45 am in some Random Part of Brooklyn! Was that really a good idea? Probably not. Wanted to get home at midnight, but got home past 2am instead because of the second Uber ride.

I posted my Uber stubbs below.. but I certify the above story is accurate. I think I deserve an extra discount - I'm not sure I should rely on Zipcar in the future after this.