data, code and conversation - from Andy Boothe

2025-12-172025-12-17

(More) Statistics Without the Agonizing Pain: Probability Distributions

One of my favorite conference talks of all time is Statistics Without the Agonizing Pain, by John Rauser, who was at that time head of data science at Pinterest. In this talk, he explains the statistical argument underpinning the Student’s t-Test in simple, approachable terms using an unforgettable example involving mosquitoes and beer. It’s about 15 minutes long, and well worth your time, if you haven’t watched it before.

After watching that video, I realized that—like most things—statistics is complex but ultimately straightforward once you understand the underlying ideas. The problem is that the modern approach to teaching statistics often gets in the way of that understanding. Historically, statistical methods were designed for a world where all computation had to be done by hand, so they were optimized to minimize calculation, not to maximize clarity or intuition. That design choice still has value today—efficient algorithms make modern statistical programs fast and practical. But we continue to teach statistics as if computation were still the bottleneck, even though we now all carry supercomputers in our pockets. Seen in that light, it’s obvious that the way we teach statistics has not kept up with the way we practice statistics.

To that end, I thought I’d write down some things I’ve learned about statistics over the years in a way that I hope is clearer than the average statistical textbook, mostly so I don’t forget them, but in the hopes that maybe they’ll be useful to others, too.

2025-10-152025-10-15

Having Trouble with Bedrock Errors? Check Cross-Region Inference.

I just tried to use one of the Amazon Nova models in AWS, and I got the following error message:

User: arn:aws:sts::XXXXXXXXXXXX:assumed-role/AWSReservedSSO_AccessLevelGoesHere_bec5d7da6a6db396/andy@example.com
Action: bedrock:InvokeModelWithResponseStream
On resource(s): arn:aws:bedrock:us-west-2::foundation-model/amazon.nova-micro-v1:0
Context: a service control policy explicitly denies the action

After some tooling around, I noticed that the error message was listing us-west-2 in the error message, even though I was working in us-east-2. This led me down a little rabbit hole that ultimately led to Cross-Region Inference for Amazon Bedrock. Fortunately, that article links to a nice solution.

2025-03-122025-03-12

AI is Getting Really Useful for SQL

I’m using Google BigQuery to do some ETL, and have found OpenAI’s products to be enormously helpful for the task.

A new client recently asked for some assistance working with Sunshine Act data. Since I expect additional asks about this data set to come in over time, rather than fuss with the generic UI, I decided to load the entire dataset into BigQuery instead. ChatGPT’s o3-mini-high model has generated schemas and ETL queries extremely well, accelerating my work by at least a factor of 2x.

2025-01-192025-01-19

Introducing Rapier

Rapier is a code generation companion library for Google Dagger. It is designed to reduce boilerplate by generating Dagger modules for fetching configuration data from common sources.

If you’ve ever written Dagger code like this:

@Component(modules = {RapierExampleComponentEnvironmentVariableModule.class})
public interface ExampleComponent {
    @EnvironmentVariable(value = "TIMEOUT", defaultValue = "30000")
    public long getTimeout();
}‍

Then Rapier can help!

2024-12-152024-12-15

Jackson CSV Serialization and Deserialization from the Ground Up

While there are many examples of Jackson serialization to JSON, there are comparatively few resources of Jackson serialization to CSV. Following is an example of working with a TSV-formatted dataset from the ground up, starting with creating the model object, building code samples for parsing CSV to Java objects using Jackson, writing Java objects to CSV using Jackson, and ending with code to a full round-trip test for serialization.

2024-11-112024-11-11

Regex for ISO 3166-1 Country Codes

Need a regular expression to recognize the 249 officially-assigned codes from the ISO 3166-1 alpha-2 and alpha-3 standards? Here they are!

ISO 3166-1 is a living standard and changes over time. This page is current as of November 11, 2024.

2024-10-082024-10-08

AWS API Gateway Gotchas & Lessons Learned

As convenient and powerful as AWS API Gateway is, it’s not without its quirks. Here are a few lessons I’ve learned the hard way that can help you avoid some common pitfalls when working with AWS API Gateway.

2024-09-222024-12-15

Generating Java record classes with Jackson Annotations to map JSON using ChatGPT

There’s a lot of discussion about how to use ChatGPT to generate tests for code. Another interesting use case I’ve seen fairly little coverage of is generating DTOs from JSON. Here is an example with the prompt I’ve put together applied to JSON from the manifest of a Distributed Map Run.

2024-09-012024-10-08

AWS SageMaker Object Detection Training Gotchas

As part of updates to arachn.io, I’ve started tinkering with object detection machine learning models. During my experiments on AWS SageMaker, I found that AutoPilot does not support object detection models, so I had train using notebooks. As a result, I hit some “gotchas” fine-tuning TensorFlow Object Detection models. While this notebook works a treat on its own training data (at least when run through SageMaker studio), this discussion will focus on things I learned while trying to run it on my own data on August 31, 2024.