Claus Wilke | Paragraph

Jan 28, 2025

We updated our survey of transfer learning with protein language models. Now it includes ESM C and AMPLIFY in addition to ESM-2 and ESMv. Quick conclusion: ESM C 600M is the model to use. But note reproducibility issues in next cast. 1/2 https://www.biorxiv.org/content/10.1101/2024.11.22.624936v2

Embedded content

Embedded content

Jan 23, 2025

I made an interactive worksheet that provides a basic introduction to the R language. Meant for people that may be familiar with Python or other languages and want to understand some of the core concepts of how R works. https://wilkelab.org/SDS366/worksheets/intro-to-R.html This is part of my class on data visualization with R.

Embedded content

Introduction to R – SDS 366

Jan 21, 2025

Now published: We trained a chemical language model that works with both small molecules and peptides. Ideal for making predictions on peptides with non-standard amino acids or other chemical modifications. https://pubs.acs.org/doi/full/10.1021/acs.jcim.4c01441

Embedded content

Embedded content

Jan 13, 2025

Sunset in Austin, TX.

Embedded content

Dec 16, 2024

Get ready for Gen-AI-uary 2025, the AI version of this popular generative artists activity for the first month of each year. The prompts are available. Do one per day. Do them with AI. Post them in /ai-art. https://warpcast.com/piterpasma/0x1e9bdce8

Nov 26, 2024

Are your students above average, well above average, outstanding, or truly exceptional? And how do they compare to the current students at MIT?

Embedded content

Nov 25, 2024

The complete guide for transfer learning with the protein language model ESM-2. (In brief: Use ESM-2 650M and calculate mean embeddings across sites.) https://www.biorxiv.org/content/10.1101/2024.11.22.624936v1

Scaling Down for Efficiency: Medium-Sized Transformer Models for Protein Sequence Transfer Learning

bioRxiv

Scaling Down for Efficiency: Medium-Sized Transformer Models for Protein Sequence Transfer Learning

Protein language models such as the transformer-based Evolutionary Scale Modeling 2 (ESM2) can offer deep insights into evolutionary and structural properties of proteins. While larger models, such as ESM2 15B, promise to capture more complex patterns in sequence space, they also present practical challenges due to their high dimensionality and high computational cost. We systematically evaluated the performance of all ESM2 models across many biological datasets to determine the impact of model size on transfer learning. Surprisingly, larger models do not always outperform smaller ones, especially when data is limited. Medium sized models, such as ESM2 650M, exhibited consistent performance, falling only slightly behind the 15B parameter model despite being over 20 times smaller. Additionally, we compared various methods of embedding compression to identify the most effective approach, and we found that mean embeddings consistently outperformed other compression methods. Our results show that ESM2 650M with mean embeddings offers an optimal balance between performance and efficiency, making it a practical and scalable choice for transfer learning in a variety of biological applications. Significance Statement This work challenges the common belief that larger language models always yield better results, here in the context of protein biochemistry. By systematically comparing transformer models of different sizes in transfer learning tasks, we demonstrate that medium size models, such as ESM2 650M, frequently perform as well as larger variants, specially when data is limited. These findings provide a more efficient strategy for machine learning-based protein analysis and promote the broader accessibility of AI in biology. Smaller, more efficient models can help democratize advanced machine-learning tools, making them more accessible to researchers with limited computational resources. ### Competing Interest Statement The authors have declared no competing interest.

Nov 18, 2024

Oh wow, didn't expect this. Was there any news or any other identifiable catalyst?

Embedded content

Nov 18, 2024

AI will replace programmers long before it replaces biologists. Excellent interview with the CEO of Anthropic. One of the few tech people who can actually competently talk about biology. Must watch if you're wondering how AI will shape our future. Don't be put off by the 5 hours. It's actually three interviews in one. I don't know why Fridman didn't split this up. Not nice towards the other two interviewees who aren't even mentioned in the title. https://www.youtube.com/watch?v=ugvHCXCOmm4

- YouTube

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

Nov 13, 2024

Three open-rank positions at UT Austin: Cluster in Advances in Artificial Intelligence and Data Science for Environmental Systems Apply here: https://apply.interfolio.com/158908

Apply - Interfolio

Nov 13, 2024

After a quarter century, MicroStrategy has finally made a new all-time high.

Embedded content

Nov 10, 2024

I'd like to file a complaint. People told me that definitely Eth was cooked and so I thought I had more time to accumulate and now this. Can devs do something?

Embedded content

Nov 4, 2024

AI is not a replacement for competent software developers. Listen to this (excellent) interview with the Cursor team. If somebody knows about AI-generated code it's them. It's obvious there is a huge opportunity for AI-assisted coding, but emphasis is on "assisted." The AI still needs a lot of hand-holding and human supervision. https://www.youtube.com/watch?v=oFfVt3S51T4

- YouTube

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

Oct 19, 2024

We've turned on public mode for /ai-art. So for now everybody can cast, official channel members get priority. Channel rules still apply. https://warpcast.com/clauswilke/0xa5c7fd80

Oct 17, 2024

Some general rules for /ai-art: 1. This is a channel for artists. Just because you made an image with AI doesn't mean it's art. Put some real effort and intent into your work. The mods have seen thousands of low-effort pieces and are happy to moderate accounts that post them. 2. Watch your casting cadence. If you're making multiple casts per hour and taking over the timeline chances are you'll be moderated. 3. When casting links to external minting sites (e.g. Rodeo club) make sure the artwork is clearly visible in your cast. You may have to add it to your cast separately; it depends on the site you're linking to. If your cast looks wrong delete it and cast again.

Oct 15, 2024

Taking a break from channel signups. Will open up again soon.

Oct 9, 2024

Mood yesterday vs mood today.

Embedded content

Embedded content

Oct 9, 2024

Got quoted by Nate Silver on his most recent substack. https://www.natesilver.net/p/5545-is-a-really-close-race

Embedded content

55/45 is a really close race

55/45 is a really close race

Life lessons from gambling about understanding election probabilities.

Oct 8, 2024

There seems to be a bug in the @warpcast web interface where if I go to a channel with a pinned cast and then immediately go to a different channel (via the channel side bar on the left) the pinned cast remains. @dwr

Embedded content

Embedded content

Oct 7, 2024

Every time I walk past this retainer wall I feel a bit uncomfortable.

Embedded content

Embedded content

Claus Wilke | Paragraph