From cfortune at telus.net Wed Jan 11 19:34:28 2006 From: cfortune at telus.net (Chris Fortune) Date: Wed, 11 Jan 2006 19:34:28 -0800 Subject: [Info-www-search] daemonized application of www-search? Message-ID: <00b201c61729$17b4ca20$24f351cf@codezilla> Hello, Has anybody written a (forking) daemon program, using www-search? If so I would like to talk with you, or maybe even see your code Chris Fortune, http://spamEater.com/ Thanks for supporting anti-spam R&D From cfortune at telus.net Wed Jan 11 19:39:24 2006 From: cfortune at telus.net (Chris Fortune) Date: Wed, 11 Jan 2006 19:39:24 -0800 Subject: [Info-www-search] other similar projects? Message-ID: <00bd01c61729$c8501fb0$24f351cf@codezilla> Does anybody here know of other similar projects to WWW-Search? (written in other languages or platforms?) From www-search-list at brisammon.fastmail.fm Tue Jan 31 19:27:11 2006 From: www-search-list at brisammon.fastmail.fm (Brian Sammon) Date: Tue, 31 Jan 2006 22:27:11 -0500 Subject: [Info-www-search] patch for WWW::Search::Monster of WWW-Search-Jobs-2.01 Message-ID: <200602010326.k113QxX04139@gamma.isi.edu> I have two patches for WWW::Search::Monster. One to update it to better parse current monster search results. One to update the POD. I'm not sure it's completely working with these patches, but it's working better. -------------- next part -------------- --- lib/WWW/Search/Monster.pm 2006-01-31 22:14:41.000000000 -0500 +++ lib/WWW/Search/Monster.pm.new 2006-01-31 22:12:36.000000000 -0500 @@ -76,91 +76,159 @@ =over 2 -=item * 1 Accounting/Auditing +=item * 1 Accounting/Auditing -=item * 2 Administrative and Support Services +=item * 2 Administrative and Support Services -=item * 8 Advertising/Marketing/Public Relations +=item * 8 Advertising/Marketing/Public Relations -=item * 540 Agriculture, Forestry, & Fishing +=item * 5620 Aerospace/Aviation/Defense -=item * 541 Architectural Services +=item * 540 Agriculture, Forestry, & Fishing -=item * 12 Arts, Entertainment, and Media +=item * 9004 Airlines -=item * 576 Banking +=item * 541 Architectural Services -=item * 46 Biotechnology and Pharmaceutical +=item * 12 Arts, Entertainment, and Media -=item * 542 Community, Social Services, and Nonprofit +=item * 576 Banking -=item * 543 Computers, Hardware +=item * 46 Biotechnology and Pharmaceutical -=item * 6 Computers, Software +=item * 3979 Building and Grounds Maintenance -=item * 544 Construction, Mining and Trades +=item * 8125 Business Opportunity/Investment Required -=item * 546 Consulting Services +=item * 8126 Career Fairs -=item * 545 Customer Service and Call Center +=item * 9005 Computer Services -=item * 3 Education, Training, and Library +=item * 543 Computers, Hardware -=item * 547 Employment Placement Agencies +=item * 6 Computers, Software -=item * 4 Engineering +=item * 544 Construction, Mining and Trades -=item * 548 Finance/Economics +=item * 546 Consulting Services -=item * 549 Financial Services +=item * 5622 Consumer Products -=item * 550 Government and Policy +=item * 545 Customer Service and Call Center -=item * 551 Healthcare, Other +=item * 3 Education, Training, and Library -=item * 9 Healthcare, Practitioner and Technician +=item * 7305 Electronics -=item * 552 Hospitality/Tourism +=item * 547 Employment Placement Agencies -=item * 5 Human Resources +=item * 5624 Energy/Utilities -=item * 660 Information Technology +=item * 4 Engineering -=item * 553 Installation, Maintenance, and Repair +=item * 9002 Environmental Services -=item * 45 Insurance +=item * 3561 Executive Management -=item * 554 Internet/E-Commerce +=item * 548 Finance/Economics -=item * 555 Law Enforcement, and Security +=item * 549 Financial Services -=item * 7 Legal +=item * 550 Government and Policy -=item * 47 Manufacturing and Production +=item * 7306 Healthcare - Business Office & Finance -=item * 556 Military +=item * 2947 Healthcare - CNAs/Aides/MAs/Home Health -=item * 11 Other +=item * 3972 Healthcare - Laboratory/Pathology Services -=item * 557 Personal Care and Service +=item * 2963 Healthcare - LPNs & LVNs -=item * 558 Real Estate +=item * 2990 Healthcare - Medical & Dental Practitioners -=item * 13 Restaurant and Food Service +=item * 3007 Healthcare - Medical Records, Health IT & Informatics -=item * 44 Retail/Wholesale +=item * 9014 Healthcare - Optical -=item * 10 Sales +=item * 551 Healthcare, Other -=item * 559 Science +=item * 3973 Healthcare - Pharmacy -=item * 560 Sports and Recreation +=item * 3974 Healthcare - Radiology/Imaging -=item * 561 Telecommunications +=item * 3975 Healthcare - RNs & Nurse Management -=item * 562 Transportation and Warehousing +=item * 3976 Healthcare - Social Services/Mental Health -=item +=item * 3977 Healthcare - Support Services + +=item * 3978 Healthcare - Therapy/Rehab Services + +=item * 552 Hospitality/Tourism + +=item * 5 Human Resources/Recruiting + +=item * 660 Information Technology + +=item * 553 Installation, Maintenance, and Repair + +=item * 45 Insurance + +=item * 554 Internet/E-Commerce + +=item * 555 Law Enforcement, and Security + +=item * 7 Legal + +=item * 47 Manufacturing and Production + +=item * 556 Military + +=item * 542 Nonprofit + +=item * 9010 Operations Management + +=item * 11 Other + +=item * 557 Personal Care and Service + +=item * 9007 Product Management + +=item * 9008 Project/Program Management + +=item * 5623 Publishing/Printing + +=item * 7307 Purchasing + +=item * 558 Real Estate + +=item * 13 Restaurant and Food Service + +=item * 44 Retail/Wholesale + +=item * 10 Sales + +=item * 9009 Sales - Account Management + +=item * 9011 Sales - Telemarketing + +=item * 5957 Sales - Work at Home/Commission Only + +=item * 559 Science + +=item * 560 Sports and Recreation/Fitness + +=item * 5625 Supply Chain/Logistics + +=item * 561 Telecommunications + +=item * 9013 Textiles + +=item * 562 Transportation and Warehousing + +=item * 9003 Veterinary Services + +=item * 9006 Waste Management Services =back -------------- next part -------------- --- lib/WWW/Search/Monster.pm.orig 2001-05-02 14:24:02.000000000 -0400 +++ lib/WWW/Search/Monster.pm 2006-01-31 22:14:41.000000000 -0500 @@ -264,56 +264,42 @@ $content =~ s/ / /ig; $content =~ m/Jobs (\d+) to (\d+) of (\d+)/; my $nrows = $2 - $1 + 1; - if($content =~ m/Next page >>/) { - my $options; - my $nexturl; - PROCESS_FORM: while(1) { - $tag = $p->get_tag("form"); - $nexturl = $self->{'search_base_url'} . '/'. - $tag->[1]{'action'} . '?'; - while(1) { - $token = $p->get_token(); - my $type = $token->[0]; - $tag = $token->[1]; - next PROCESS_FORM if($type eq 'E' && $tag eq 'form'); - next if($tag ne 'input'); - my $value = $token->[2]{'value'}; - last PROCESS_FORM if ($value =~ m/Next page \>\>/); - next PROCESS_FORM if ($value =~ m/\<\< Previous page/); - my $name = $token->[2]{'name'}; - my $escaped = WWW::Search::escape_query($value); - $nexturl .= "$name=$escaped" . '&' ; - } - } - print STDERR "Next url is $nexturl\n" if($debug); - $self->{'_next_url'} = $nexturl; - } else { - print STDERR "No next button\n" if($debug); - } + # Determine _next_url + my ($nexturl) = + ($content =~ /]*href="([^"]*)[^>]*>Next page >>{'_next_url'} = $self->{search_base_url} . $nexturl; + my($hits_found) = 0; my($hit) = (); $p = new HTML::TokeParser(\$content); + + #skim the content until we reach the header row of the main table while(1) { $tag = $p->get_tag("td"); my $data = $p->get_trimmed_text("/td"); last if($data eq 'Location' || $data eq 'Company' || - $data eq 'Modified'); + $data eq 'Modified'); # 'Modified' is not used anymore (Jan06) } + for(my $i = 0; $i< $nrows; $i++) { - $tag = $p->get_tag("tr"); + $tag = $p->get_tag("tr"); #Jump to beginning of next row + $tag = $p->get_tag("td"); - $tag = $p->get_tag("td"); # fix skew problem WR my $date = $p->get_trimmed_text("/td"); - $tag = $p->get_tag("td"); - my $location = $p->get_trimmed_text("/td"); + $tag = $p->get_tag("a"); my $url = $self->{'search_base_url'} . $tag->[1]{href}; my $title = $p->get_trimmed_text("/a"); + $tag = $p->get_tag("td"); my $company = $p->get_trimmed_text("/td"); + + $tag = $p->get_tag("a"); + my $location = $p->get_trimmed_text("/a"); + $hit = new WWW::SearchResult; $hit->url($url); $hit->company($company); From iat at maebashi-it.org Tue Jan 31 20:37:53 2006 From: iat at maebashi-it.org (IAT) Date: Wed, 1 Feb 2006 13:37:53 +0900 Subject: [Info-www-search] Call for Workshop Proposals (WI-IAT'06) Message-ID: <00f601c626e9$43c27250$9032a8c0@pc7kenji> [Apologies if you receive this more than once] ============================================================== Call for Workshop Proposals 2006 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT'06) Hong Kong Convention and Exhibition Centre, Hong Kong, China, 18-22 December 2006. http://www.comp.hkbu.edu.hk/~wii06 (Workshop Proposals Due: 10 April 2006) =================================================================== The Program Committees of the 2006 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT'06) invite proposals for Workshops. The Workshops will be held at the beginning of the Conference, December 18, 2006 at Hong Kong Convention and Exhibition Centre. The workshop organizers will be responsible for advertising the workshop, forming the program committees, reviewing and selecting the papers, and guaranteeing a high quality worthy of the prestige and range of the Conference. All papers accepted for workshops will be included in the Workshop Proceedings, which are expected to be published by IEEE Computer Society Press and will be available at the workshops. The workshop organizers will also have the discretion of editing selected papers (after their expansion and revision) into books or special journal issues. Workshops may be full-day or half-day. A full-day workshop should select 20-25 regular papers, while a half-day workshop should select 10-13 regular papers, from a large number of submissions. The workshop organizers should ensure the presence of authors of accepted papers at the workshops. I. Workshop Topics Each workshop subject will focus on new research challenges and initiatives in Web Intelligence (WI) and Intelligent Agent Technology (IAT). The workshops should provide an informal and vibrant forum for researchers and industry practitioners to share their research results and practical development experiences in these two fields. Suggested, but not limited to, workshop topics include: - Intelligent E-Technology (including E-Science, E-Business, E-Learning, E-Finance, E-Government, E-Community) - Intelligent Human-Web Interaction - Knowledge Grids and Grid Intelligence - Semantics and Ontology Engineering - Social Networks and Social Intelligence - Ubiquitous Computing - Web Agents - Web Information Filtering and Retrieval - Web Mining and Forming - Web Security, Integrity, Privacy and Trust - Web Services and Grid Services - Web Support Systems - World Wide Wisdom Web (W4) - Agent Systems Modeling and Methodology - Autonomous Knowledge and Information Agents - Autonomous Auctions and Negotiation - Autonomy-Oriented Computing (AOC) - Learning and Self-Adapting Agents - Distributed Intelligence II. Workshop Proposal Submission Workshop proposals should include the following elements: - Title of the workshop - Your name, affiliation, mailing address and e-mail address - A description of the topic of the workshop (not exceeding 200 words) - Type of the workshop (full-day or half-day) - A description of how the workshop will contribute to the field of Web Intelligence and/or Intelligent Agent Technology - A short description on how the workshop will be advertised so as to ensure a sufficiently wide range of authors and high quality papers After the acceptation of a workshop proposal the organizer(s) should: - Create a "Call for papers/participation" for the workshop - Create a Web page for the workshop, the link of which will be published on the Conference Web site - Create a Board of Reviewers (Program Committee) - Review and select papers - Schedule the workshop activities Those papers selected by a workshop organizer will also be reviewed by the Workshop Co-Chairs for final acceptance. All submitted papers will be reviewed on the basis of technical quality, relevance, significance, and clarity. We will provide an online paper submission and review system to support the workshops. III. Important Dates - April 10, 2006: Workshop proposal submission due (Please send proposals by e-mail to all three Workshop Co-Chairs) - April 20, 2006: Notification to workshop proposers - April 30, 2006: Each workshop organizer sends out Call for Workshops Papers - July 30, 2006: Due date for full workshop papers submission (at least two reviews for each paper) - September 5, 2006: Final acceptance by Workshop Co-Chairs - September 8, 2006: Notification of paper acceptance to authors - October 8, 2006: Camera-ready of accepted papers - December 18, 2006: Workshop day We look forward to your support in making 2006 IEEE/WIC/ACM WI-IAT workshops an exciting event. Workshop Co-Chairs: Cory J. Butz, University of Regina, Canada E-mail: butz at cs.uregina.ca Ngoc Thanh Nguyen, Wroclaw University of Technology, Poland E-mail: thanh at pwr.wroc.pl Yasufumi Takama, Tokyo Metropolitan University, Japan E-mail: ytakama at cc.tmit.ac.jp Note: we will not have a separate workshop registration fee this year (i.e., only one conference registration covers everything). For your information, the WI-IAT'06 conference will be co-located with the IEEE International Conference on Data Mining (ICDM'06) for providing synergism among the three research areas. It will provide opportunities for technical collaboration beyond that of previous conferences. The three conferences will have the joint opening, keynote, reception, and banquet. Attendees only need to register one conference and can attend sessions across the three conferences. We are planning to have a joint panel and joint paper sessions that discuss common problems in the three areas.