2022-06-06

OkHttp3爬虫+FastJson解析json+Jsoup解析Document节点+文本写入文件

爬虫 + JSON + Document + 文件写入

String url = "www.baidu.com";

//使用okhttp3中的OKHttpClient请求，获取到Response（网络请求）
//注意①：Okhttp中的请求头不允许有中文，解决方法：将含有中文的请求头信息进行编码
//注意②：如果Cookie太长，直接通过复制值粘贴，则可能出现复制不全的情况，导致网络请求失败或者错误
OkHttpClient httpClient = new OkHttpClient();
Request request = new Request.Builder()
    .addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36")
    .url(url)
    .build();
Response response = httpClient.newCall(request).execute();

//使用alibaba的fastjson进行字符串和JSON格式的转换（JSON格式）
//注意：如果请求到的是json格式，若不先转为json格式，再取相对应的xml内容，则字符串将会一直是Unicode编码（即看不到utf-8的中文）
String responseString = response.body().string();
System.out.println(responseString);
JSONObject jsonObject = JSONObject.parseObject(responseString, JSONObject.class);
JSONObject jsonObject1 = jsonObject.getJSONObject("html");
String html = jsonObject1.getString("list");

//将获取到的内容（字符类型）写进文件中
File file = new File("C:\\Users\\用户名\\Desktop\\新建文本文档.txt");
FileOutputStream fileOutputStream = new FileOutputStream(file);
fileOutputStream.write(html.getBytes());
//关闭文件写入流
fileOutputStream.close();

//使用Jsoup将字符串转为Document格式（XML格式），方便获取到指定的节点标签的内容
Document document = Jsoup.parse(html);
Elements imgs = document.getElementsByTag("h3");
Element element = imgs.get(0);                  //获取elements中第一个element元素
System.out.println(element.text());             //得到标签中的文本内容
System.out.println(element.nodeName());         //得到标签名
System.out.println(element.tagName());          //得到标签名
element.tagName("lry");     //修改元素的标签名
System.out.println(element.isBlock());          //测试元素是否是块级元素
System.out.println(element.parent());           //得到父节点
System.out.println(element.parents());          //得到元素的父类和祖先节点直到文档的根。返回元素最接近的一个父类的堆栈
System.out.println(element.children());         //得到子元素集

Title:OkHttp3爬虫+FastJson解析json+Jsoup解析Document节点+文本写入文件

Author:

Created:2022-06-06, 20:53:48

Updated:2023-09-30, 02:17:39

Full URL:http://example.com/2022/06/06/OkHttp3%E7%88%AC%E8%99%AB-FastJson%E8%A7%A3%E6%9E%90json-Jsoup%E8%A7%A3%E6%9E%90Document%E8%8A%82%E7%82%B9+%E6%96%87%E6%9C%AC%E5%86%99%E5%85%A5%E6%96%87%E4%BB%B6/

License: "CC BY-NC-SA 4.0" Keep Link & Author if Distribute.