Tuesday 26 January 2010

Some of my PhD work

Though actually, both these pretty graphics were produced by my co-author, not me.



That one's the peptide carnosine, a substance that's used in the biochemical process of meat digestion (amongst other things). It comes in two main forms, imaginatively named A and B, depending on whether a particular bond is rotated 180 degrees or not. We're working on determining the shape of the thing. You see, 8 of those bonds can all rotate to one of 3 different positions - 0, 120 or 240 degrees. That's 6500 odd shapes. If it takes a supercomputer 80 or so CPU hours just to find the energy of one of those shapes... you can see how it would be good not to have to find them all.


That illustrates how tuned Genetic Algorithms with various population sizes and mutation rates do. You can see that there's a roughly triangular area near the bottom, with populations 40 or less, and mutation rates of 20% to 45% (200/1000 to 450/1000) where the number of calculations needed to find the minimum energy (ie preferred shape of the molecule) in the gaseous state is minimised.

That is, the bluey-purplish bits.

Those are the results from using another Genetic Algorithm to find the best population size/mutation rate combinations when working on carnosine A.

The hollow squares have really good performance: trouble is, all the others found the right answer 100% of the time, and these three miscreants didn't, not within the maximum number of generations allotted, anyway.

We then tried the best 5 of these parameter pairs on carnosine B, and got very similar results.

In practical terms, it means that by using the recommended parameters, computational chemists working in this area will definitely be able to find what they're looking for in a month instead of half a year. And have a better than 50/50 chance of getting the right answer within a week.

In even more practical terms, by finding out shapes and the resultant electromagnetic fields from these molecules, we have some more of the necessary tools needed to find out how enzymes and drugs work. Eventually, we should be able to determine what shaped molecule we need to cause a particular effect in biochemistry, and then work out what the composition of that molecule has to be.

I'm over-simplifying all over the place, and I'll have real chemists coming after me with axes. But you get the general idea.

p.s. Did I mention that this is Fun? I love being Zoe Brain, Girl Scientist! And who knows, maybe Dr Brain by 2012.

7 comments:

Unknown said...

I bearly understood a word of that. lol
Go Dr. Brain!
So I have been wondering...who is the Pinky to your Brain?

Cynthia Lee

Laserlight said...

I'll have real chemists coming after me with axes

And the rotation for those axes is....? ;-)

Anonymous said...

Neat coincidence. I spent some time helping a biochem prof parallelize and optimize his protein folding solver. His problem was the "curse of dimensionality." The principal component analysis required finding eigenvectors for ridiculously huge matrices.

It's nice to work on projects like that, because you know just how important work like that is.

Unknown said...

And this will keep the beer from going flat, how? (Very proud of your efforts and message sent to the world, thanks.)

Vene said...

Oh, that is cool, thanks for sharing. But, I have a question, do you think that this could be used for larger molecules, like proteins instead of small peptides? Also, does this method take into account the active site's possibility of being at a higher energy level than the rest of the molecule? Although I'd imagine that's more of a problem for proteins instead of molecules this small.

Zoe Brain said...

Eric - yes, it should be useful for proteins too.

I'll send you a copy of the paper. That details the meta-method we will use for applying the work generally.

You can of course use a low-level of theory to get a group of 20 which will contain the 10 best conformers, even if their energies are many hartree out. Then use a higher level of theory model on them.

We're using the UB3LYP density functional with the 6-31+g(d,p) basis set. So while not the best, MP2, CCSD, or CCSD(T), it's not trivial either. Not just PM3, as used by Kluev et al (and pretty much everyone else).

Now I better get back to writing the paper for GECCO 2010 in Oregon. Hopefully it will be accepted. Deadline's 30 hrs away.

Sorry about the jargon, Cynthia and others. I did try to make it understandable - sorry I wasn't better at it. As you can see, things get a bit impenetrable for non-specialists very quickly as soon as we start discussing details.

Anonymous said...

Haha, "Dr. Brain" - that will be awesome. :D

By the way, what software did your co-author use to make the two graphics? They're quite nice.