[問題] 如何取得html標籤中的值??

飛

2006-10-10 01:15:57 UTC

Pattern pattern = Pattern.compile("href=\"[^\"]*");
Matcher m = pattern.matcher(sb.toString());
while(m.find()) {
String str = m.group();
str = str.replaceFirst("href=\"", "");
str = str.substring(0, str.length()-1);
System.out.println(str);
}

Instead of regular expression,
HTML parser might be the more appropriate and pretty solution.
If you are interested, read this site http://htmlparser.sourceforge.net/

import java.util.regex.*;
import java.io.*;
public class Test1 {
public static void main(String[] args) throws Exception {
StringBuffer sb = new StringBuffer();
BufferedReader br = new BufferedReader(new FileReader("c:/work/112.txt"));
String line="";
while( (line=br.readLine()) != null) {
sb.append(line);
}
Pattern pattern = Pattern.compile(" "); //裡面要怎麼用才能把我要的網址抽出來?
Matcher m = pattern.matcher(sb.toString());
while(m.find()) {
String str = m.group();
System.out.println(str);
}
}
}
假設在infile.txt裡面有內容如下
<a class=l href="http://www.ntnu.edu.tw/art/" onmousedown="
<a class=l href="http://203.71.53.40/" onmousedown="
<a class=l href="http://www.scdxart.com/" onmousedown="return
我只想要裡面的網址如下
http://www.ntnu.edu.tw/art/
http://203.71.53.40/
http://www.scdxart.com/
Pattern pattern = Pattern.compile(" ");
這行裡面要怎麼用
我現在只弄到這樣但還是不對
Pattern pattern = Pattern.compile("http://[^\"\\s]+");

--
[1;30;40m夫兵者不祥之器物或惡之故有道者不處君子居則貴左用兵則貴右兵者不祥之器非君子[m
[1;30m之器不得已[37m[30m而用之恬淡為上勝而不美而美之者是樂殺人夫樂殺人者則不可得志於天下
[m[1;30m矣吉事尚左凶事尚右偏將軍居左上將軍居右言以喪禮處之殺人之眾以哀悲泣之戰勝以[m
[1;30m喪禮處之道常[37m無名[30m樸雖小天下莫能臣侯王若能守之萬物將自賓天地相合以降甘露民莫[m
[1;30m之令而自均始制有名名亦既有夫亦將知止知止可以不殆譬道之在天下 [37m60.50.17.74[30m海[m