Goal

Pancake currently only looks at page urls and titles. Investigate what we can find out about pages by looking at their content. Example are:

Page structure. headings, article text, etc.
Meta tags: icons, authors, etc.
Embedded micro formats like recipes, contacts, geo-information, etc.

We should find out how easy is it to find and extract this information and see if a big enough number of pages has useful information that we can do something with it.

How we use the extracted information for generic results and maybe very domain-specific results like for example "people", "recipes", "locations".

Extracting meta-data from pages

Goal

Navigation menu

Extracting meta-data from pages

Goal

Navigation menu

Search