Python 示例介绍-伙伴云

网友投稿 782 2022-05-30

1.概述

1.1介绍

运用python +requests+beautifulsoup程序实现对shodan搜索引擎中的数据进行爬取，获取malware（恶意软件）IP的筛选，根据国家、服务、组织、产品进行搜索获取具体的IP信息。具体思路是通过requests构造Fromdata登录对后台发送数据请求，获取网页源码之后利用BeautifulSoup和正则对源码进行文本筛选，获取我们需要的相关信息。

2.原理介绍

2.1 shodan介绍

Shodan是一个搜索引擎，与谷歌不同的是，Shodan不是在网上搜索网址，而是直接进入互联网的背后通道。Shodan可以说是一款“黑暗”谷歌，一刻不停的在寻找着所有和互联网关联的服务器、摄像头、打印机、路由器等等。每个月Shodan都会在大约5亿个服务器上日夜不停地搜集信息。

Shodan所搜集到的信息是极其惊人的。凡是链接到互联网的红绿灯、安全摄像头、家庭自动化设备以及加热系统等等都会被轻易的搜索到。Shodan的使用者曾发现过一个水上公园的控制系统，一个加油站，甚至一个酒店的葡萄酒冷却器。而网站的研究者也曾使用Shodan定位到了核电站的指挥和控制系统及一个粒子回旋加速器。

Shodan真正值得注意的能力就是能找到几乎所有和互联网相关联的东西。而Shodan真正的可怕之处就是这些设备几乎都没有安装安全防御措施，其可以随意进入。

2.2 Requests介绍

requests是python的一个HTTP客户端库，跟urllib，urllib2类似，那为什么要用requests而不用urllib2呢？官方文档中是这样说明的：

python的标准库urllib2提供了大部分需要的HTTP功能，但是API太逆天了，一个简单的功能就需要一大堆代码。本着偷懒的想法，个人更倾向于requests这个第三方库的运用。

Requests的简单应用介绍：

#导入模块

Import requests

#发送GET请求

r = requests.get('http://www.zhidaow.com')

#获取网页源码

r．text()

#禁止跳转

r=requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN', allow_redirects = False)

#获取网页状态码，用来判断请求是否成功

r.status_code

#获取响应头

r.headers

#发送POST,PUT,DELETE,HEAD,OPTIONS请求

#r = requests.post("http://httpbin.org/post")

#r = requests.put("http://httpbin.org/put")

#r = requests.delete("http://httpbin.org/delete")

#r = requests.head("http://httpbin.org/get")

#r = requests.options("http://httpbin.org/get")

更多的应用可以上http://www.zhidaow.com/post/python-requests-install-and-brief-introduction查询

2.3 BeautifulSoup介绍

BeautifulSoup是一个可以从HTML或XML文件中提取数据的Python库.它能够通过你喜欢的转换器实现惯用的文档导航,查找,修改文档的方式.Beautiful Soup会帮你节省数小时甚至数天的工作时间.

BeautifulSoup简单用法的介绍：

#首先设定一个html文本

html_doc = """

The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie,

Lacie and

Tillie;

and they lived at the bottom of a well.

...

#导入BeautifulSoup模块

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc)

#几个简单的浏览结构化数据的方法

soup.title #获取title的标签行

# The Dormouse's story

soup.title.name

# u'title'

soup.title.string #打印title标签中的文本

# u'The Dormouse's story

soup.title.parent.name #title标签的父节点

# u'head

soup.p #所有P标签行

The Dormouse's story

soup.p['class'] #打印P标签中的class属性值

# u'title

soup.a #打印a标签行

# Elsie

soup.find_all('a')#找到所有的a标签行

# [Elsie,

# Lacie,

# Tillie]

soup.find(id="link3")

# Tillie

从文档中找到所有标签的链接:

for link in soup.find_all('a'):

print(link.get('href'))

# http://example.com/elsie

# http://example.com/lacie

# http://example.com/tillie

从文档中获取所有文字内容:

print(soup.get_text())

# The Dormouse's story

# Once upon a time there were three little sisters; and their names were

# Elsie,

# Lacie and

# Tillie;

# and they lived at the bottom of a well.

# ...

以上是beautifulsoup的基本用法，个人喜欢beautifulsoup和正则一起用，这样比较省时省力。

2.4环境搭建

1、搭建python环境

1) 安装python安装包

-：https://www.python.org/downloads/release/python-2711/

2) 在环境变量中添加相应的变量名与变量值

变量名：Path

变量值：C:\Python27 （即Python的安装路径）

3)测试Python是否配置完成

在cmd中输入python -V 提示以下信息表示配置完成

C:\Users\Administrator>python -V

Python 2.7.11

2、安装 pip

打开C:\Python27\Scripts查看安装python中是否有pip文件，将C:\Python27\Scripts加入系统环境变量即可，然后打开CMD窗口，执行pip命令，查看pip是否可用。

from bs4 import BeautifulSoup

sys.setdefaultencoding('utf-8')

ports=[2280,1604,1177,16464,443]

organizations=['George+Mason+University','Enzu','Psychz+Networks','Turk+Telekom','HOPUS+SAS']

products=['Gh0st+RAT+trojan','DarkComet+trojan','njRAT+trojan','ZeroAccess+trojan','XtremeRAT+trojan']

page_nums=range(1,6)

countries=['AE','AF','AL','AM','AO','AR','AT','AU','AZ','BD','BE','BF','BG','BH','BI','BJ','BL','BN','BO','BR',

'BW','BY','CA','CF','CG','CH','CL','CM','CN','CO','CR','CS','CU','CY','DE','DK','DO','DZ','EC','EE','EG','ES',

'ET','FI','FJ','FR','GA','GB','GD','GE','GH','GN','GR','GT','HK','HN','HU','ID','IE','IL','IN','IQ','IR','IS',

'IT','JM','KG','KH','KP','KR','KT','KW','KZ','LA','LB','LC','LI','LK','LR','LT','LU','LV','LY','MA','MC','MD',

'MG','ML','MM','MN','MO','MT','MU','MW','MX','MY','MZ','NA','NE','NG','NI','NL','NO','NP','NZ','OM','PA','PE',

'PG','PH','PK','PL','PT','PY','QA','RO','RU','SA','SC','SD','SE','SG','SI','SK','SM','SN','SO','SY','SZ','TD',

'TG','TH','TJ','TM','TN','TR','TW','TZ','UA','UG','US','UY','UZ','VC','VE','VN','YE','YU','ZA','ZM','ZR','ZW']

class Get_IP():

def __init__(self,url,headers):

req=requests.post(self.url,params=data,headers=self.headers)

html=req.text

pattern=re.compile(r'(.*?)')

pattern0=re.compile(r'value=\'(.*?)\'/')

soup=BeautifulSoup(html,'html.parser')

Add_on=soup.find_all(text=re.compile(r'Added\ on'))

for i in soup.select('input[id="search_input"]'):

text_b=re.search(pattern0,str(i)).group(1)

for i in Add_on:

addon_list.append(i)

for i in soup.find_all(href=re.compile(r'/host/')):

if 'Details' not in i.get_text():

ip_list.append(i.get_text())

searchfor_list.append(text_b)

return ip_list,searchfor_list,addon_list

class Get_Ip_Info():

def __init__(self,url,headers):

self.url=url

self.headers=headers

def get_ip_info(self):

for data in url_infos:

req_ip=requests.post(self.url,params=data,headers=self.headers)

html_ip=req_ip.text

soup=BeautifulSoup(html_ip,'html.parser')

tag_text=[]

tag_content=[]

for i in soup.find_all('th'):

tag_text.append(i.get_text())

for i in soup.find_all('td'):

tag_content.append(i.get_text())

for i in soup.select('meta[name="twitter:description"]'):

pattern=re.compile(r'content="Ports open:(.*?)"')

ports=re.search(pattern,str(i)).group(1)

info=dict(zip(tag_content,tag_text))

info['Ports']=ports

info['Ip']=data['continue'].strip('https://www.shodan.io/host/')

information.append(info)

if __name__=="__main__":

headers={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) ""Gecko/20100101 Firefox/25.0"}

url='https://account.shodan.io/login'

file_path=r'C:\Users\Administrator\Downloads\shoudan%s.txt'%time.strftime('%Y-%m-%d',time.localtime(time.time()))

file_w=open(file_path,'wb')

all_url=[]

datas=[]

for country in countries:

for page_num in page_nums:

home_url='https://www.shodan.io/search?query=category%3A%22malware%22+country%3A%22'+'%s'%country+'%22&'+'page=%d'%page_num

all_url.append(home_url)

for port in ports:

for page_num in page_nums:

home_url='https://www.shodan.io/search?query=category%3A%22malware%22+port%3A%22'+'%d'%port+'%22&'+'page=%d'%page_num

all_url.append(home_url)

for org in organizations:

for page_num in page_nums:

home_url='https://www.shodan.io/search?query=category%3A%22malware%22+org%3A%22'+'%s'%org+'%22&'+'page=%d'%page_num

all_url.append(home_url)

for product in products:

for page_num in page_nums:

home_url='https://www.shodan.io/search?query=category%3A%22malware%22+product%3A%22'+'%s'%product+'%22&'+'page=%d'%page_num

all_url.append(home_url)

for continue_url in all_url:

info={

'username':'xxxxxxxxxx',#Shodan账户名

'password':'xxxxxxxxxx',#Shodan密码

'grant_type':'password',

'continue':continue_url,

'login_submit':'Log in'

}

datas.append(info)

app=Get_IP(url,headers)

app.get_info()

for ip in ip_list:

Python 示例介绍

url_ip='https://www.shodan.io/host/'+'%s'%ip

url_info={

'username':'xxxxxxxxxx',#Shodan账户名

'password':'xxxxxxxxxx',#Shodan密码

'grant_type':'password',

'continue':url_ip,

'login_submit':'Log in'

}

url_infos.append(url_info)

app=Get_Ip_Info(url,headers)

app.get_ip_info()

total_info=zip(searchfor_list,information,addon_list)

for i in range(len(total_info)):

search_for=total_info[i][0]

add_on=total_info[i][2]

ip_info=total_info[i][1]['Ip']

try:

city_info=str(total_info[i][1]['City'])

except KeyError:

city_info='NULL'

try:

ports_info=str(total_info[i][1]['Ports'])

except KeyError:

ports_info='NULL'

try:

country_info=str(total_info[i][1]['Country'])

except KeyError:

country_info='NULL'

try:

hostnames_info=str(total_info[i][1]['Hostnames'])

except KeyError:

hostnames_info='NULL'

word=search_for+' ||' +country_info+' '+city_info+' ||'+hostnames_info+' ||'+ip_info+'||'+ports_info+' ||'+add_on

file_w.write(word+'\r\n')

file_w.close()

2.6 功能介绍：

根据搜索条件，对Shodan搜索到的信息进行收集，并每天产生信息日志。初始的一个搜索条件是category:"malware"对恶意软件的一个搜索，然后再根据国家、服务、组织、产品细化搜索，获取所有涉及到的IP地址，最后继续根据IP地址，进入details页面，返回IP涉及到的详细数据（Ports、Ip、Hostname、Country、City）

简单地介绍Excel中的数组公式为进一步的研究和应用打下基础（excel数组公式）

782 2022-05-30

Python 示例 介绍

sumproduct函数的功能是什么（SUMPRODUCT函数的使用方法）

简单地介绍Excel中的数组公式为进一步的研究和应用打下基础（excel数组公式）

excel表格常用快捷键的介绍（excel表格快捷键使用）

推荐文章

企业生产管理是什么，企业生产管理软件

进盘点进销存软件排行榜前十名

进销存系统哪个简单好用？进销存系统优点

工厂生产管理（工厂生产管理流程及制度）

生产管理软件，机械制造业生产管理，制造业生产过程管理软件

进销存软件和ERP有什么区别？进销存与erp软件理解

进销存如何进行库存管理

如何利用excel制作销售订单管理系统？

数据库订单管理系统有哪些功能？数据库订单管理系统怎么设计？

什么是数据库管理系统？

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜

智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐

WPS2016怎么绘制简单的价格表?

Excel项目进度表模板，简化您的项目进度管理

智能定制家居管理系统：重新定义家庭生活方式

友情链接

微信扫一扫：分享

推荐文章

最近发表

热评文章

友情链接