教育资源为主的文档平台

当前位置: 查字典文档网> 所有文档分类> > 计算机软件及应用> SearchCrawler

SearchCrawler

上传者:刘纯波
|
上传时间:2015-04-21
|
次下载

SearchCrawler

java Swing编写的网络爬虫,来自java变成艺术

import java.awt.*; import java.awt.event.*; import java.io.*; import http://wendang.chazidian.com.*; import java.util.*; import java.util.regex.*; import javax.swing.*; import javax.swing.table.*; // The Search Web Crawler public class SearchCrawler extends JFrame { // Max URLs drop down values. private static final String[] MAX_URLS = {"50", "100", "500", "1000"}; // Cache of robot disallow lists. private HashMap disallowListCache = new HashMap(); // Search GUI controls. private JTextField startTextField; private JComboBox maxComboBox; private JCheckBox limitCheckBox; private JTextField logTextField; private JTextField searchTextField; private JCheckBox caseCheckBox; private JButton searchButton; // Search stats GUI controls. private JLabel crawlingLabel2; private JLabel crawledLabel2; private JLabel toCrawlLabel2; private JProgressBar progressBar; private JLabel matchesLabel2; // Table listing search matches. private JTable table; // Flag for whether or not crawling is underway. private boolean crawling; // Matches log file print writer. private PrintWriter logFileWriter; // Constructor for Search Web Crawler. public SearchCrawler() { // Set application title. setTitle("Search Crawler"); // Set window size. setSize(600, 600); // Handle window closing events. addWindowListener(new WindowAdapter() { public void windowClosing(WindowEvent e) { actionExit(); } }); // Set up file menu. JMenuBar menuBar = new JMenuBar(); JMenu fileMenu = new JMenu("File"); fileMenu.setMnemonic(KeyEvent.VK_F); JMenuItem fileExitMenuItem = new JMenuItem("Exit", KeyEvent.VK_X); fileExitMenuItem.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent e) { actionExit(); } }); fileMenu.add(fileExitMenuItem); menuBar.add(fileMenu); setJMenuBar(menuBar); // Set up search panel. JPanel searchPanel = new JPanel(); GridBagConstraints constraints; GridBagLayout layout = new GridBagLayout(); searchPanel.setLayout(layout); JLabel startLabel = new JLabel("Start URL:"); constraints = new GridBagConstraints(); constraints.anchor = GridBagConstraints.EAST; constraints.insets = new Insets(5, 5, 0, 0); layout.setConstraints(startLabel, constraints); searchPanel.add(startLabel); startTextField = new JTextField(); constraints = new GridBagConstraints(); constraints.fill = GridBagConstraints.HORIZONTAL; constraints.gridwidth = GridBagConstraints.REMAINDER; constraints.insets = new Insets(5, 5, 0, 5); layout.setConstraints(startTextField, constraints); searchPanel.add(startTextField); JLabel maxLabel = new JLabel("Max URLs to Crawl:"); constraints = new GridBagConstraints(); constraints.anchor = GridBagConstraints.EAST;

java Swing编写的网络爬虫,来自java变成艺术

constraints.insets = new Insets(5, 5, 0, 0); layout.setConstraints(maxLabel, constraints); searchPanel.add(maxLabel); maxComboBox = new JComboBox(MAX_URLS); maxComboBox.setEditable(true); constraints = new GridBagConstraints(); constraints.insets = new Insets(5, 5, 0, 0); layout.setConstraints(maxComboBox, constraints); searchPanel.add(maxComboBox); limitCheckBox = new JCheckBox("Limit crawling to Start URL site"); constraints = new GridBagConstraints(); constraints.anchor = GridBagConstraints.WEST; constraints.insets = new Insets(0, 10, 0, 0); layout.setConstraints(limitCheckBox, constraints); searchPanel.add(limitCheckBox); JLabel blankLabel = new JLabel(); constraints = new GridBagConstraints(); constraints.gridwidth = GridBagConstraints.REMAINDER; layout.setConstraints(blankLabel, constraints); searchPanel.add(blankLabel); JLabel logLabel = new JLabel("Matches Log File:"); constraints = new GridBagConstraints(); constraints.anchor = GridBagConstraints.EAST; constraints.insets = new Insets(5, 5, 0, 0); layout.setConstraints(logLabel, constraints); searchPanel.add(logLabel); String file = System.getProperty("user.dir") + System.getProperty("file.separator") + "crawler.log"; logTextField = new JTextField(file); constraints = new GridBagConstraints(); constraints.fill = GridBagConstraints.HORIZONTAL; constraints.gridwidth = GridBagConstraints.REMAINDER; constraints.insets = new Insets(5, 5, 0, 5); layout.setConstraints(logTextField, constraints); searchPanel.add(logTextField); JLabel searchLabel = new JLabel("Search String:"); constraints = new GridBagConstraints(); constraints.anchor = GridBagConstraints.EAST; constraints.insets = new Insets(5, 5, 0, 0); layout.setConstraints(searchLabel, constraints); searchPanel.add(searchLabel); searchTextField = new JTextField(); constraints = new GridBagConstraints(); constraints.fill = GridBagConstraints.HORIZONTAL; constraints.insets = new Insets(5, 5, 0, 0); constraints.gridwidth= 2; constraints.weightx = 1.0d; layout.setConstraints(searchTextField, constraints); searchPanel.add(searchTextField); caseCheckBox = new JCheckBox("Case Sensitive"); constraints = new GridBagConstraints(); constraints.insets = new Insets(5, 5, 0, 5); constraints.gridwidth = GridBagConstraints.REMAINDER; layout.setConstraints(caseCheckBox, constraints); searchPanel.add(caseCheckBox); searchButton = new JButton("Search"); searchButton.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent e) { actionSearch(); } }); constraints = new GridBagConstraints(); constraints.gridwidth = GridBagConstraints.REMAINDER; constraints.insets = new Insets(5, 5, 5

java Swing编写的网络爬虫,来自java变成艺术

, 5); layout.setConstraints(searchButton, constraints); searchPanel.add(searchButton); JSeparator separator = new JSeparator(); constraints = new GridBagConstraints(); constraints.fill = GridBagConstraints.HORIZONTAL; constraints.gridwidth = GridBagConstraints.REMAINDER; constraints.insets = new Insets(5, 5, 5, 5); layout.setConstraints(separator, constraints); searchPanel.add(separator); JLabel crawlingLabel1 = new JLabel("Crawling:"); constraints = new GridBagConstraints(); constraints.anchor = GridBagConstraints.EAST; constraints.insets = new Insets(5, 5, 0, 0); layout.setConstraints(crawlingLabel1, constraints); searchPanel.add(crawlingLabel1); crawlingLabel2 = new JLabel(); crawlingLabel2.setFont( crawlingLabel2.getFont().deriveFont(Font.PLAIN)); constraints = new GridBagConstraints(); constraints.fill = GridBagConstraints.HORIZONTAL; constraints.gridwidth = GridBagConstraints.REMAINDER; constraints.insets = new Insets(5, 5, 0, 5); layout.setConstraints(crawlingLabel2, constraints); searchPanel.add(crawlingLabel2); JLabel crawledLabel1 = new JLabel("Crawled URLs:"); constraints = new GridBagConstraints(); constraints.anchor = GridBagConstraints.EAST; constraints.insets = new Insets(5, 5, 0, 0); layout.setConstraints(crawledLabel1, constraints); searchPanel.add(crawledLabel1); crawledLabel2 = new JLabel(); crawledLabel2.setFont( crawledLabel2.getFont().deriveFont(Font.PLAIN)); constraints = new GridBagConstraints(); constraints.fill = GridBagConstraints.HORIZONTAL; constraints.gridwidth = GridBagConstraints.REMAINDER; constraints.insets = new Insets(5, 5, 0, 5); layout.setConstraints(crawledLabel2, constraints); searchPanel.add(crawledLabel2); JLabel toCrawlLabel1 = new JLabel("URLs to Crawl:"); constraints = new GridBagConstraints(); constraints.anchor = GridBagConstraints.EAST; constraints.insets = new Insets(5, 5, 0, 0); layout.setConstraints(toCrawlLabel1, constraints); searchPanel.add(toCrawlLabel1); toCrawlLabel2 = new JLabel(); toCrawlLabel2.setFont( toCrawlLabel2.getFont().deriveFont(Font.PLAIN)); constraints = new GridBagConstraints(); constraints.fill = GridBagConstraints.HORIZONTAL; constraints.gridwidth = GridBagConstraints.REMAINDER; constraints.insets = new Insets(5, 5, 0, 5); layout.setConstraints(toCrawlLabel2, constraints); searchPanel.add(toCrawlLabel2); JLabel progressLabel = new JLabel("Crawling Progress:"); constraints = new GridBagConstraints(); constraints.anchor = GridBagConstraints.EAST; constraints.insets = new Insets(5, 5, 0, 0); layout.setConstraints(progressLabel, constraints); searchPanel.add(progressLabel); progressBar = new JProgressBar(); progressBar.setMinimum(0); progressBar.

java Swing编写的网络爬虫,来自java变成艺术

setStringPainted(true); constraints = new GridBagConstraints(); constraints.fill = GridBagConstraints.HORIZONTAL; constraints.gridwidth = GridBagConstraints.REMAINDER; constraints.insets = new Insets(5, 5, 0, 5); layout.setConstraints(progressBar, constraints); searchPanel.add(progressBar); JLabel matchesLabel1 = new JLabel("Search Matches:"); constraints = new GridBagConstraints(); constraints.anchor = GridBagConstraints.EAST; constraints.insets = new Insets(5, 5, 10, 0); layout.setConstraints(matchesLabel1, constraints); searchPanel.add(matchesLabel1); matchesLabel2 = new JLabel(); matchesLabel2.setFont( matchesLabel2.getFont().deriveFont(Font.PLAIN)); constraints = new GridBagConstraints(); constraints.fill = GridBagConstraints.HORIZONTAL; constraints.gridwidth = GridBagConstraints.REMAINDER; constraints.insets = new Insets(5, 5, 10, 5); layout.setConstraints(matchesLabel2, constraints); searchPanel.add(matchesLabel2); // Set up matches table. table = new JTable(new DefaultTableModel(new Object[][]{}, new String[]{"URL"}) { public boolean isCellEditable(int row, int column) { return false; } }); // Set up matches panel. JPanel matchesPanel = new JPanel(); matchesPanel.setBorder( BorderFactory.createTitledBorder("Matches")); matchesPanel.setLayout(new BorderLayout()); matchesPanel.add(new JScrollPane(table), BorderLayout.CENTER); // Add panels to display. getContentPane().setLayout(new BorderLayout()); getContentPane().add(searchPanel, BorderLayout.NORTH); getContentPane().add(matchesPanel, BorderLayout.CENTER); } // Exit this program. private void actionExit() { System.exit(0); } // Handle search/stop button being clicked. private void actionSearch() { // If stop button clicked, turn crawling flag off. if (crawling) { crawling = false; return; } ArrayList errorList = new ArrayList(); // Validate that start URL has been entered. String startUrl = startTextField.getText().trim(); if (startUrl.length() < 1) { errorList.add("Missing Start URL."); } // Verify start URL. else if (verifyUrl(startUrl) == null) { errorList.add("Invalid Start URL."); } // Validate that max URLs is either empty or is a number. int maxUrls = 0; String max = ((String) maxComboBox.getSelectedItem()).trim(); if (max.length() > 0) { try { maxUrls = Integer.parseInt(max); } catch (NumberFormatException e) { } if (maxUrls < 1) { errorList.add("Invalid Max URLs value."); } } // Validate that matches log file has been entered. String logFile = logTextField.getText().trim(); if (logFile.length() < 1) { errorList.add("Missing Matches L

java Swing编写的网络爬虫,来自java变成艺术

og File."); } // Validate that search string has been entered. String searchString = searchTextField.getText().trim(); if (searchString.length() < 1) { errorList.add("Missing Search String."); } // Show errors, if any, and return. if (errorList.size() > 0) { StringBuffer message = new StringBuffer(); // Concatenate errors into single message. for (int i = 0; i < errorList.size(); i++) { message.append(errorList.get(i)); if (i + 1 < errorList.size()) { message.append("\n"); } } showError(message.toString()); return; } // Remove "www" from start URL if present. startUrl = removeWwwFromUrl(startUrl); // Start the search crawler. search(logFile, startUrl, maxUrls, searchString); } private void search(final String logFile, final String startUrl, final int maxUrls, final String searchString) { // Start the search in a new thread. Thread thread = new Thread(new Runnable() { public void run() { // Show hour glass cursor while crawling is under way. setCursor(Cursor.getPredefinedCursor(Cursor.WAIT_CURSOR)); // Disable search controls. startTextField.setEnabled(false); maxComboBox.setEnabled(false); limitCheckBox.setEnabled(false); logTextField.setEnabled(false); searchTextField.setEnabled(false); caseCheckBox.setEnabled(false); // Switch search button to "Stop." searchButton.setText("Stop"); // Reset stats. table.setModel(new DefaultTableModel(new Object[][]{}, new String[]{"URL"}) { public boolean isCellEditable(int row, int column) { return false; } }); updateStats(startUrl, 0, 0, maxUrls); // Open matches log file. try { logFileWriter = new PrintWriter(new FileWriter(logFile)); } catch (Exception e) { showError("Unable to open matches log file."); return; } // Turn crawling flag on. crawling = true; // Perform the actual crawling. crawl(startUrl, maxUrls, limitCheckBox.isSelected(), searchString, caseCheckBox.isSelected()); // Turn crawling flag off. crawling = false; // Close matches log file. try { logFileWriter.close(); } catch (Exception e) { showError("Unable to close matches log file."); } // Mark search as done. crawlingLabel2.setText("Done"); // Enable search controls. startTextField.setEnabled(true); maxComboBox.setEnabled(true); limitCheckBox.setEnabled(true); logTextField.setEnabled(true); searchTextField.setEnabled(true); caseCheckBox.setEnabled(true); //

版权声明:此文档由查字典文档网用户提供,如用于商业用途请与作者联系,查字典文档网保持最终解释权!

下载文档

热门试卷

2016年四川省内江市中考化学试卷
广西钦州市高新区2017届高三11月月考政治试卷
浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
广西钦州市钦州港区2017届高三11月月考政治试卷
广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
广西钦州市高新区2016-2017学年高二11月月考政治试卷
广西钦州市高新区2016-2017学年高一11月月考政治试卷
山东省滨州市三校2017届第一学期阶段测试初三英语试题
四川省成都七中2017届高三一诊模拟考试文科综合试卷
2017届普通高等学校招生全国统一考试模拟试题(附答案)
重庆市永川中学高2017级上期12月月考语文试题
江西宜春三中2017届高三第一学期第二次月考文科综合试题
内蒙古赤峰二中2017届高三上学期第三次月考英语试题
2017年六年级(上)数学期末考试卷
2017人教版小学英语三年级上期末笔试题
江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
四川省简阳市阳安中学2016年12月高二月考英语试卷
四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
安徽省滁州中学2016—2017学年度第一学期12月月考​高三英语试卷
山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷

网友关注

2017河南省公务员考试行测试题考点呈现多样化特点
2017河南公务员考试面试热点模拟题:劝阻吸烟引发老人离世
2017河南公务员考试申论试题参考答案及解析
2017河南公务员考试申论每周一练答案:大学生就业多元化
面试题库:面试每日一练结构化面试模拟题答案2.5
2017河南公务员面试模拟题:执法如何“顾情”
面试题库:面试每日一练结构化面试模拟题答案2.6
2016河南公务员面试真题(2017年4月3日)
2017河南公务员考试申论每周一练:治理假货
2017河南公务员考试申论每周一练:谈绿色发展
2016河南省考面试题(2017年4月9日)
面试题库:面试每日一练结构化面试模拟题答案1.29
2017河南公务员考试申论每周一练答案:治理假货
2017河南公务员面试模拟题:把道德修养当做人生必修课
2017河南公务员考试申论每周一练:以沟通建立警民互信
2017河南公务员考试申论每周一练:大学生就业多元化
2017河南省考行测五大亮点解读
2017河南公务员面试模拟题:“独生子女看护假”入法
2018河南公务员面试模拟题:如何评价甘愿受累的“有钱人”
2018河南公务员考试行测演练厅之生活常识模拟题
2017河南公务员考试申论试题紧跟国考时效性强
2017河南公务员考试申论每周一练:寒门难出贵子吗
2016河南公务员考试行测真题深度解读:难度相对稳定
2016河南公务员考试行测亮点分析:地沟油入考题
面试题库:面试每日一练结构化面试模拟题2.5
2016河南公务员考试行测亮点分析:关注社会热点
2016河南公务员考试行测真题及答案解析
2017河南公务员考试申论试卷深度解读:关注“中国制造”
2017河南公务员考试申论每周一练答案:“礼让斑马线”成城市最美风景
2017河南公务员考试申论每周一练答案:人才发展

网友关注视频

苏教版二年级下册数学《认识东、南、西、北》
飞翔英语—冀教版(三起)英语三年级下册Lesson 2 Cats and Dogs
冀教版小学英语五年级下册lesson2教学视频(2)
8 随形想象_第一课时(二等奖)(沪教版二年级上册)_T3786594
人教版历史八年级下册第一课《中华人民共和国成立》
【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,广东省
二年级下册数学第三课 搭一搭⚖⚖
沪教版牛津小学英语(深圳用) 四年级下册 Unit 3
北师大版数学 四年级下册 第三单元 第二节 小数点搬家
8.练习八_第一课时(特等奖)(苏教版三年级上册)_T142692
青岛版教材五年级下册第四单元(走进军营——方向与位置)用数对确定位置(一等奖)
冀教版小学数学二年级下册第二周第2课时《我们的测量》宝丰街小学庞志荣
3.2 数学二年级下册第二单元 表内除法(一)整理和复习 李菲菲
七年级英语下册 上海牛津版 Unit3
沪教版牛津小学英语(深圳用) 四年级下册 Unit 12
【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,辽宁省
【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省
沪教版牛津小学英语(深圳用) 四年级下册 Unit 4
【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,安徽省
3月2日小学二年级数学下册(数一数)
苏科版数学 八年级下册 第八章第二节 可能性的大小
沪教版牛津小学英语(深圳用) 五年级下册 Unit 10
沪教版八年级下册数学练习册一次函数复习题B组(P11)
苏科版数学七年级下册7.2《探索平行线的性质》
冀教版小学数学二年级下册第二单元《租船问题》
第4章 幂函数、指数函数和对数函数(下)_六 指数方程和对数方程_4.7 简单的指数方程_第一课时(沪教版高一下册)_T1566237
第19课 我喜欢的鸟_第一课时(二等奖)(人美杨永善版二年级下册)_T644386
《空中课堂》二年级下册 数学第一单元第1课时
30.3 由不共线三点的坐标确定二次函数_第一课时(市一等奖)(冀教版九年级下册)_T144342
河南省名校课堂七年级下册英语第一课(2020年2月10日)