Astrometry.net

The 2007 Data Mining in Aeronautics, Science and Exploration Systems (DMASES 2007) meeting was held recently in Mountain View California (June 26-27, 2007).  Ashok Srivastava had asked me to chair the Science Session, and we were fortunate to invite a wide array of excellent speakers.  Among them was Sam Roweis who is doing a sabbatical at Google, and presented a Live demonstration of “astrometry.net“.  In an earlier post (which has been deleted) I described this endeavor as GoogleSky, however Sam emailed me to explain that he has nothing to do with GoogleSky, which was started before astrometry.net.  Roweis works on Astrometry.net with colleagues: David Hogg, Dustin Lang and Keir Mierly as a collaboration between Google researchers at the University of Toronto and New York University.  Google will get involved when they are ready to implement the system at a large scale.

Astrometry.net is poised to revolutionize astronomy by making any and all images of the night sky taken over time accessible to astronomers and those who perform data analysis or data mining on astronomical images.  This includes images ranging from those taken by the Hubble Space Telescope (HST) to those taken by a tourist who happened to catch a portion of the night sky in their photograph.

Astrometry.net can take ANY image of the night sky and identify the image with respect to the precise astronomical coordinates.  Astrometry.net is so good, Roweis explained, that theywill not need to rely on any information from the photographer; so good that they will not trust the information from the photographer.  Roweis had intended for the audience to supply a set of astronomical images to be identified, but we did not have time to set that up beforehand.  Instead, he downloaded the Astronomy Picture of the Day and uploaded it to the NYU server for identification.

Roweis explained that the problem is not as complex as one might think.  Astrometry.net begins by selecting the stars in the image.  These stars are then compared to a database of millions of stars, and matches are found.  Surprisingly, only a handful of matches are needed to uniquely identify the image.  When the results are examined, many stars in the image do not match to the limited database, and many stars in the database do not match to the image (due to clouds, exposure, etc.)  The problem is merely that of matching points… not trivial, but overall straightforward.

In the future Astrometry.net intends to implement planet matching which may allow them to identify the time of the photograph in addition to the sky region.

Similar projects can be found at:
http://www.wikisky.org
which is the same as
http://www.sky-map.org/

Kevin Knuth
Albany NY

Posted under Astronomy, Computation, Exploration, Internet, Photography, Research, Space

This post was written by drknuth on July 17, 2007

Tyrannosaurus Rex Blood?

Image of potential T. Rex blood cells

Mary Schweitzer who amazed us by discovering remains of soft tissue in dinosaur bones has been continuing to make new discoveries.  NOVA Science Now is going to broadcast a special on July 17th on the possibility that Dr. Schweitzer has discovered the remains of Tyrannosaurus Rex blood (see picture above).

Kevin Knuth
Albany NY

Posted under Dinosaurs, Evolution, Paleontology

This post was written by drknuth on July 14, 2007

MaxEnt 2007

Friday marked the closing session of the 27th International Workshop on Bayesian and Maximum Entropy Methods in Science and Engineering (MaxEnt 2007).  I had the great pleasure to host this year’s meeting in the lovely city of Saratoga Springs.  We had approximately 100 participants from almost 25 countries spanning all six of the populated continents!

This year marked the 50th anniversary of Ed Jaynes’ ground-breaking 1957 paper “Information Theory and Statistical Mechanics” where he introduces the idea that Statistical Mechanics is an Inferential Theory.  This paper led to the concept of Maximum Entropy, which is used to assign priors in Bayesian Probability Theory, but also, was shown by Adom Giffin at this meeting to be consistent with Bayesian learning.

I was very pleased to have had a distinguished array of invited speakers including: Shun-ichi Amari (RIKEN, JAPAN), Jose M. Bernardo (Universitat de Valencia, SPAIN), Tony Bell (Redwood Neuroscience Institute, USA), Philip Goyal (Perimeter Institute, CANADA), Phil Gregory (University of British Columbia, CANADA), and Stephen Roberts (Oxford, UK).  I will blog over the next few days about some of the ideas that were presented at this meeting.

I also was extremely pleased with the Tutorials, which were presented by John Skilling, Jose Bernardo, Ariel Caticha, Carlos Rodriguez, and myself.  They dealt heavily with the foundations of probability theory and connected heavily with information theory, geometry, and order theory.  Rather than being traditional tutorials, he majority of these talk presented new ideas and new results!

Every year the MaxEnt meeting takes on its own personality.  In 2005 in San Jose, the focus was on sampling methods, and I was pleased to have John Skilling’s contribution on Nested Sampling in my volume.  This year, the focus was on the Foundations of Probability Theory, Information Geometry, Entropy and Bayes, Lattices and Measures, Levels and Loops, Information and Physics, and Quantum Mechanics.  The 2007 Proceedings Volume will contain a large number of cutting-edge papers on these exciting topics.  The collective atmosphere of the meeting seemed to indicate that this community is close to making some exciting breakthroughs.

Many of us at the meeting, myself included, were extremely disappointed by the fact that some individuals were unable to attend due to visa problems.  One such individual, who has been prominent in our community, was denied entry due to the fact that he had a dual French-Iranian citizenship.  These policies enacted by our politicians are as damaging to the scientific community and the advances that we work to provide for humanity, as they are to the individuals and their well-being.  It is high time that our nations stop acting like children.

Next year, MaxEnt 2008 will be hosted by Julio M. Stern on the beaches near Sao Paolo BRAZIL.  MaxEnt 2009 will be hosted by Paul Goggans in University, Mississippi USA, and MaxEnt 2010 will be held in Grenoble FRANCE.

Kevin Knuth
Albany NY

Posted under Mathematics, Philosophy, Physics, Probability, Research

This post was written by drknuth on July 14, 2007

GoogleSky at DMASES 2007

Sam Roweis emailed me to explain that he is affiliated with astrometry.net
I have re-written this entry and corrected the errors.

http://www.huginn.com/knuth/blog/2007/07/17/astrometrynet/ 

Kevin Knuth
Albany NY

Posted under Astronomy, Computation, Internet, Photography, Research, Software, Space

This post was written by drknuth on July 14, 2007

Topography in Machine Learning

I recently chaired the Science Session of the Data Mining in Aeronautics, Science and Exploration Systems (DMASES 2007) Conference at the Computer History Museum in Mountain View CA.  I gave a short 20 minute talk on Problem Solving and referred to a book by David Perkins that I really enjoy, called The Eureka Effect: The Art and Logic of Breakthrough Thinking.

Perkins gives names to various topographical traps that impair our efforts toward effective problem solving.  By giving something a name, we have power over it.  This enables us to think more clearly about these problems in learning that must be overcome.

The Wilderness Trap reflects the fact that the space is immense.  There are so many possible solutions, it is virtually impossible to explore them all.

The Clueless Plateau is a flat region of the solution space that contains virtually no information as to where to go to find the more probable solutions (the peaks).

The Canyon Trap is artificially imposed by the problem solver as they restrict themselves to a subspace of the solution space that does not contain the solution to the problem.

The Oasis Trap is a local solution (local maximum in probability) that seems so promising that you don’t want to leave it.  However, it is nowhere near the true solution to the problem.

Last, the Solution is a high peak in the space (there’s gold in them thar hills!).  The base of the solution peak can be wide, which makes the peak easy to find, or extremely narrow, which makes the problem difficult.

In his book he relates problem solving to puzzles, which provides a set of clear and familiar examples in which to explore the various traps that inhibit effective problem solving.  In my talk, I referred to the Nine Dot Puzzle, which Perkins as explains, exhibits features of each of the four traps above. 

I also explored communication as inference, where the job of the listener is to infer and model the thoughts of the speaker. This perspective of communication as a pairing of encoding and inference leads nicely to seeing jokes as problems that exhibit one or more of these classic topographical difficulties.   Many jokes and much of humor relies on trapping the audience.

A favorite of mine is raised by the question “When geese fly in a ‘V’, why is it that one leg of the ‘V’ is often longer than the other?” 
Think about it before going on… maybe those of you with some aeronautics or biology experience can think of the answer.  The solution will be revealed at the end.

This joke/riddle is effective because it leads the listener to a Canyon Trap where he/she constrains themself to a subspace of the solution space.  The subspace searched is the subspace of profound answers; such as: aerodynamics, efficiency, biological considerations, etc.  However, the solution lies in the space of mundane solutions, as you can see below.

Solving problems, puzzles, jokes and riddles are analogous to solving problems in physics, chemsitry, and machine learning.  The traps are the same, and this fact can give us unique insights into designing machine learning algorithms.  Specifically, the best algorithms will rely on both educated guesses and exploration… that is heuristics  and sampling.    Breakthroughs in machine learning are analogous to breakthroughs in creative thinking and problem solving.  These are the AHAs of Martin Gardiner

In a future post, I will describe how these AHAs are analogous to phase transitions in statistical mechanics.

Well, you made it to the end, and here is the answer to the riddle: “There are more geese in it!”

Happy problem solving!

Kevin Knuth
Albany NY

Posted under Intelligent Systems, Philosophy, Research, Solutions

This post was written by drknuth on July 7, 2007