User:Niyogi

From Wikipedia, the free encyclopedia

magazine[edit]

  • downloaded 615 (385 .com) raw content (bz2 format)

Next steps:

  • build feature lists using new wikipedia lexicon

category[edit]

  • have amazon and shopping for lexicon.txt

Next steps:

  • need ebay; figure out soap/php interface to ebay and get
  • rebuild cat maps

dmoz[edit]

  • have 120K/174K front pages; 1link.csv has "key features" now

Next steps:

  • build corpus of key features for each category in 1link.csv

ontok/ExtractAttributesfromText[edit]

  • prototyped code, seen it work for "thinkpad laptops"

Next steps:

  • test out search_by_product/brand on "600x ipod nano" etc.
  • write search_by_model code

ontok/ExtractLocations[edit]

  • use new city/state features to detect city/state combos quickly on "contact us" pages

ontok/wikipedia/products[edit]

  • have wikipedia and product lexicon merged

Next steps:

 foreach ($titlearr as $title) {
   expand the associations on
     productbrand:   any product-brand combo appearing
     brandmodel:     anything that looks like a model (alphanumeric or 00 or short)
     productfeature: any product-feature combo appearing
     productunit:    any product-unit mapping      
 }
 foreach ($brandarr as $brand) {
   // determine product associations
 }
 foreach ($brandmodel as $brand => $modelarr) {
   foreach ($modelarr as $model => $n) {
     //  determine product associations
   }
 }
 how to determine product associations
 read in the productbrand table
 read yhoo search response, google suggest reponse
 detect "ma" features from output
 for brand links, check the productbrand table
 for brand-model links, check the productbrand table