这一年多涌现了很多新面孔,多了很多 pp 图,同时也有不少人退坑。因为前排活跃高手的排位发生了较大的变化,所以我觉得有必要重新爬一次数据。
收集数据
ppy 修改了一些东西,所以代码也改动了一点。同时,这次也爬取了很多新的数据,应该可以分析一些更有意思的东西。
前排提醒:本文仅仅抓取了排行榜前 1000 的玩家,不在排行榜内的退役玩家是抓取不到的。而且本文使用的区分 4k 和 7k 玩家的方法相当粗糙,极有可能不准确。分析结果仅供参考,图一乐就行。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 import jsonimport randomimport timeimport requestsimport csvimport loggingimport statisticsfrom bs4 import BeautifulSouplogging.basicConfig( level = logging.INFO, format = '%(asctime)s %(levelname)s %(message)s' , datefmt = '%Y-%m-%dT%H:%M:%S' ) def random_sleep (): seconds = random.randint(1 , 3 ) logging.info(f"Sleep for {seconds} s" ) time.sleep(seconds) mode_name = ["mania" ] headers = {'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36' } start_page = 2 end_page = 20 fieldnames = ['uid' , 'rank' , 'rank_highest' , 'rank_avg_90d' , 'username' , 'country_code' , 'country_name' , 'accuracy' , 'play_count' , 'main_keymode' , 'performance' , 'pp_4k' , 'pp_7k' , 'SS_count' , 'SSH_count' , 'S_count' ,'SH_count' , 'A_count' , 'count_300' , 'count_100' , 'count_50' ,'count_miss' , 'is_supporter' , 'has_supported' , 'support_level' , 'is_active' , 'last_visit' , 'name_change_count' , 'post_count' , 'comments_count' , 'kudosu_available' , 'kudosu_total' , 'friend_count' , 'follower_count' , 'badges_count' , 'level' , 'level_progress' , 'ranked_score' , 'play_time' , 'total_score' , 'total_hits' , 'maximum_combo' , 'replays_count' , 'user_achievements_count' , 'avatar_url' , 'cover_url' , 'title' , 'join_date' , 'is_admin' , 'is_bng' , 'is_full_bn' , 'is_gmt' , 'is_limited_bn' , 'is_moderator' , 'is_nat' , 'is_restricted' , 'is_silenced' ] with open ('data.csv' , 'w' , newline='' ) as csvfile: writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for mode in mode_name: for pg in range (start_page, end_page + 1 ): logging.info(f"Getting page {pg} for mode {mode} " ) data_lst = [] url = f"https://osu.ppy.sh/rankings/{mode} /performance" payload = {"page" : f"{pg} " } r = requests.get(url=url, params=payload, headers=headers) if r.status_code != 200 : raise requests.exceptions.HTTPError("The response code is not 200. Something's wrong!" ) webdata = r.text soup = BeautifulSoup(webdata,"lxml" ) uid_list = soup.find_all("a" , class_ = "ranking-page-table__user-link-text js-usercard" ) stat_list = soup.find_all("td" , class_ = "ranking-page-table__column" ) for idx in range (len (uid_list)): uid = uid_list[idx]['data-user-id' ] info_url = f"https://osu.ppy.sh/users/{uid} /mania" info_resp = requests.get(url=info_url, headers=headers) if info_resp.status_code != 200 : raise requests.exceptions.HTTPError("The response code is not 200. Something's wrong!" ) info_text = info_resp.text info_soup = BeautifulSoup(info_text,"lxml" ) info_raw = info_soup.find("div" , class_ = "js-react--profile-page osu-layout osu-layout--full" )['data-initial-data' ] info_data = json.loads(info_raw) country_code = info_data["user" ]["country" ]["code" ] country_name = info_data["user" ]["country" ]["name" ] is_supporter = info_data["user" ]["is_supporter" ] has_supported = info_data["user" ]["has_supported" ] support_level = info_data["user" ]["support_level" ] avatar_url = info_data["user" ]["avatar_url" ] cover_url = info_data["user" ]["cover_url" ] is_active = info_data["user" ]["is_active" ] last_visit = info_data["user" ]["last_visit" ] title = info_data["user" ]["title" ] join_date = info_data["user" ]["join_date" ] is_admin = info_data["user" ]["is_admin" ] is_bng = info_data["user" ]["is_bng" ] is_full_bn = info_data["user" ]["is_full_bn" ] is_gmt = info_data["user" ]["is_gmt" ] is_limited_bn = info_data["user" ]["is_limited_bn" ] is_moderator = info_data["user" ]["is_moderator" ] is_nat = info_data["user" ]["is_nat" ] is_restricted = info_data["user" ]["is_restricted" ] is_silenced = info_data["user" ]["is_silenced" ] name_change_count = len (info_data["user" ]["previous_usernames" ]) post_count = info_data["user" ]["post_count" ] comments_count = info_data["user" ]["comments_count" ] kudosu_available = info_data["user" ]["kudosu" ]["available" ] kudosu_total = info_data["user" ]["kudosu" ]["total" ] friend_count = info_data["user" ]["follower_count" ] follower_count = info_data["user" ]["mapping_follower_count" ] badges_count = len (info_data["user" ]["badges" ]) rank = info_data["user" ]["statistics" ]["global_rank" ] rank_highest = info_data["user" ]["rank_highest" ]["rank" ] rank_avg_90d = statistics.fmean(info_data["user" ]["rank_history" ]["data" ]) username = info_data["user" ]["username" ] accuracy = info_data["user" ]["statistics" ]["hit_accuracy" ] play_count = info_data["user" ]["statistics" ]["play_count" ] performance = info_data["user" ]["statistics" ]["pp" ] SS_count = info_data["user" ]["statistics" ]["grade_counts" ]["ss" ] SSH_count = info_data["user" ]["statistics" ]["grade_counts" ]["ssh" ] S_count = info_data["user" ]["statistics" ]["grade_counts" ]["s" ] SH_count = info_data["user" ]["statistics" ]["grade_counts" ]["sh" ] A_count = info_data["user" ]["statistics" ]["grade_counts" ]["a" ] count_300 = info_data["user" ]["statistics" ]["count_300" ] count_100 = info_data["user" ]["statistics" ]["count_100" ] count_50 = info_data["user" ]["statistics" ]["count_50" ] count_miss = info_data["user" ]["statistics" ]["count_miss" ] level = info_data["user" ]["statistics" ]["level" ]["current" ] level_progress = info_data["user" ]["statistics" ]["level" ]["progress" ] ranked_score = info_data["user" ]["statistics" ]["ranked_score" ] play_time = info_data["user" ]["statistics" ]["play_time" ] total_score = info_data["user" ]["statistics" ]["total_score" ] total_hits = info_data["user" ]["statistics" ]["total_hits" ] maximum_combo = info_data["user" ]["statistics" ]["maximum_combo" ] replays_count = info_data["user" ]["statistics" ]["replays_watched_by_others" ] user_achievements_count = len (info_data["user" ]["user_achievements" ]) try : pp_4k = int (info_data["user" ]["statistics" ]["variants" ][0 ]["pp" ]) pp_7k = int (info_data["user" ]["statistics" ]["variants" ][1 ]["pp" ]) if pp_4k > pp_7k: main_keymode = "4k" else : main_keymode = "7k" except : pp_4k = pp_7k = main_keymode = "N/A" player_pkg = {} for item in fieldnames: player_pkg[item] = eval (item) data_lst.append(player_pkg) logging.info(f"#{rank} {username} {uid} done!" ) random_sleep() with open ('data.csv' , 'a' , encoding="utf-8" , newline='' ) as csvfile: writer = csv.DictWriter(csvfile, fieldnames=fieldnames) for data in data_lst: writer.writerow(data)
数据到手了,接下来用 pandas 和 matplotlib 来做一些可视化分析~
代码基本还是一样的,少数有改动,为了方便其他人复现,依然会全部提供
0x00 - 国家和地区
稍微看看高手们都来自哪里吧~
先导入一些必要的库并读取数据
1 2 3 4 5 6 7 8 9 10 import randomimport matplotlibimport matplotlib.pyplot as pltimport matplotlib.colors as mcolorsimport pandas as pdimport numpy as npfrom cycler import cyclerdf = pd.read_csv('data.csv' ) df
首先看一下地域构成,这一环节依旧是把少于 20 人的国家地区全部归类到其它(Other)
1 2 3 df_draw = df.groupby('country_code' ).size().to_frame(name='count' ) df_draw = df_draw.sort_values('count' , ascending=False ).reset_index() df_draw
country_code
count
KR
237
CN
116
US
87
JP
57
ID
45
PH
44
PE
24
TH
24
CA
23
CL
21
Other
322
饼图(pie chart)代码如下所示
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 def shift_row_to_bottom (df, index_to_shift ): """Shift row, given by index_to_shift, to bottom of df.""" idx = df.index.tolist() idx.pop(index_to_shift) df = df.reindex(idx + [index_to_shift]) return df def my_autopct (pct ): return ('%1.1f%%' % pct) if pct > 4 else '' df_draw.loc[df_draw['count' ] < 20 , 'country_code' ] = 'Other' df_draw = df_draw.groupby('country_code' )['count' ].sum ().reset_index() df_draw = df_draw.sort_values('count' , ascending=False , ignore_index=True ) df_draw = shift_row_to_bottom(df_draw, 1 ) cm = plt.get_cmap('Set3' ) matplotlib.rcParams["axes.prop_cycle" ] = cycler( color=[cm(v) for v in np.linspace(0 , 1 , len (df_draw))] ) plt.pie(df_draw['count' ], labels=df_draw['country_code' ], autopct=my_autopct, startangle=140 ) plt.title("osu!mania top #1000 country code (2024)" ) plt.show()
中日韩美加起来仍然占据半壁江山,紧随其后的是印尼,菲律宾,秘鲁,泰国。韩国高手数量缩水严重,Other 类别中的玩家数量显著增加。其它国家排名基本不变,但是阿根廷(AR)和英国(UK)被加拿大(CA)和秘鲁(PE)取而代之。老牌强国马来西亚(MY)也消失了。
0x01 - 键数之争
2024 年,7k 的 pp 上限仍旧比 4k 要高得多,但是前阵子也 rank 了很多 4k 高星 pp 图,所以这一块想必会发生较大的变化
main_keymode
count_2024
count_2022
7k
694
631
4k
306
369
可以看到虽然 rank 了那么多 4k pp 图,结果 4k 人更少了(-63)。一哥说主要 7k 一个 pp 图顶几个 4k。
简单画个环图(donut chart)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 df_draw = df_draw.groupby('main_keymode' )['count' ].sum ().reset_index() df_draw = df_draw.sort_values('count' , ascending=False , ignore_index=True ) cm = plt.get_cmap('Set3' ) matplotlib.rcParams["axes.prop_cycle" ] = cycler( color=[cm(v) for v in np.linspace(0 , 1 , len (df_draw))] ) explode = [0.05 , 0.05 ] plt.pie(df_draw['count' ], labels=df_draw['main_keymode' ], autopct='%1.1f%%' , startangle=140 , explode=explode, pctdistance=0.85 ) plt.title("osu!mania top #1000 main keymode (2024)" ) centre_circle = plt.Circle((0 , 0 ), 0.70 , fc='white' ) fig = plt.gcf() fig.gca().add_artist(centre_circle) plt.show()
从我另一个项目 oscarcx123/osu-minimum-pp 可以看到三位数门槛走势图
现在三位数门槛已经来到了 12100pp,纯 4k 确实不太好刷,怪不得接下来又要削 7k 的 pp
0x02 - 赞助皮老板
让我看看有多少铁公鸡👀
每个玩家有两个 boolean 值,分别是has_supported和is_supporter,简单groupby一下就能算出“从未”、“曾经”、“现在”三种状态的人数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 df_draw = df.groupby(['has_supported' , 'is_supporter' ]).size().to_frame(name='count' ) status = ['never' , 'was_supporter' , 'is_supporter' ] df_draw['status' ] = status cm = plt.get_cmap('Set3' ) matplotlib.rcParams["axes.prop_cycle" ] = cycler( color=[cm(v) for v in np.linspace(0 , 1 , len (df_draw))] ) explode = [0.1 , 0 , 0 ] plt.pie(df_draw['count' ], labels=df_draw['status' ], autopct='%1.1f%%' , startangle=140 , explode=explode, pctdistance=0.85 ) plt.title("osu!mania top #1000 supporter (2024)" ) centre_circle = plt.Circle((0 , 0 ), 0.70 , fc='white' ) fig = plt.gcf() fig.gca().add_artist(centre_circle) plt.show()
这么看来,绝大部分高手都买过至少一次 osu!supporter。等等,真的是这样吗?这次也同时抓取了support_level。这个数值范围是 0 - 3,对应的就是个人主页的那个 supporter 爱心,应该是只有氪金过的才会增加。
根据我的猜测,support_level的对应关系如下所示
1 2 3 4 0 = 没买过 1 = 买了 1 年以内 2 = 买了 1 - 5 年 3 = 买了 5 年以上
这就跑下代码看看
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 df_draw = df.groupby('support_level' ).size().to_frame(name='count' ) df_draw = df_draw.sort_values('support_level' , ascending=True ).reset_index() df_draw['support_level' ] = 'L' + df_draw['support_level' ].astype(str ) df_draw cm = plt.get_cmap('Set3' ) matplotlib.rcParams["axes.prop_cycle" ] = cycler( color=[cm(v) for v in np.linspace(0 , 1 , len (df_draw))] ) plt.pie(df_draw['count' ], labels=df_draw['support_level' ], autopct='%1.1f%%' , startangle=140 , pctdistance=0.85 ) plt.title("osu!mania top #1000 support level (2024)" ) centre_circle = plt.Circle((0 , 0 ), 0.70 , fc='white' ) fig = plt.gcf() fig.gca().add_artist(centre_circle) plt.show()
这么一看,好像很多人的 supporter 都是别人送的??
把从未成为 supporter 的人剔除之后再看看
1 2 3 4 5 df_draw = df[df['has_supported' ] == 1 ] df_draw = df_draw.groupby('support_level' ).size().to_frame(name='count' ) df_draw = df_draw.sort_values('support_level' , ascending=True ).reset_index() df_draw['support_level' ] = 'L' + df_draw['support_level' ].astype(str ) print (df_draw)
support_level
count
L0
584
L1
119
L2
121
L3
30
如果没有理解错误的话,相当多的高手并没有自己买过 supporter,都是通过其它途径获取的。。。
依照惯例,来看看铁公鸡头目👀
这个地方代码优化了一下,因为之前使用了两个.loc,实际上只用一个就可以完成筛选
1 df.loc[(df['has_supported' ] == False ), ['rank' , 'username' , 'country_code' ]]
rank
username
country_code
60
LR2MAG
KR
61
RaffCo
ID
72
karcice
KR
98
Dius
KR
114
7keyEgoist
JP
0x03 - 改名富豪
osu 跟别的游戏不太一样,没法随意免费改 id,修改次数越多就越贵,价格表如下所示。如果买了 supporter,那么第一次改名是免费的。
Changes
Price
1
US$4
2
US$8
3
US$16
4
US$32
5
US$64
6+
US$100
那么来看看大家都改了几次 id 吧~
1 2 3 4 5 6 7 8 9 10 df_draw = df.groupby('name_change_count' ).size().to_frame(name='count' ).reset_index() fig, ax = plt.subplots() bars = ax.bar(df_draw['name_change_count' ], df_draw['count' ]) ax.bar_label(bars) ax.set_title("osu!mania top #1000 player name change (2024)" ) ax.set_xlabel('# Name Change' ) ax.set_ylabel('Player Count' ) plt.show()
好像并没有太大的变化~
name_chg_times
2022
2024
chg
0
419
400
-19
1
390
405
15
2
136
126
-10
3
42
46
4
4
12
20
8
5
0
2
2
6
1
1
0
看看改名狂魔都有谁👀
1 df.loc[(df['name_change_count' ] == 5 ) | (df['name_change_count' ] == 6 ), ['rank' , 'username' , 'country_code' , 'name_change_count' ]]
rank
username
country_code
name_change_count
18
ZoyFangirl
KR
5
533
Lovelyn
FI
6
972
[KC]CruB
US
5
这下 ppy 躺着数钱了,看看高手们给他贡献了多少钱
1 2 3 4 df_draw = df.groupby('name_change_count' ).size().to_frame(name='count' ).reset_index() df_draw['cost' ] = [0 , 4 , 12 , 28 , 60 , 124 , 224 ] df_draw['ppy_laugh' ] = df_draw['cost' ] * df_draw['count' ] df_draw['ppy_laugh' ].sum ()
0x04 - 肝帝
键盘毁灭者
osu 个人主页有个总命中次数(Total Hits),也就是键盘敲击次数
1 2 df_draw = df.sort_values('total_hits' , ascending=False , ignore_index=True ) df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'main_keymode' , 'total_hits' ]].head(10 )
rank
username
country_code
main_keymode
total_hits
7
bojii
PH
7k
140716567
9
Stellium
KR
7k
122258223
809
Min-
KR
7k
120880661
346
JDS20
CO
7k
119917450
334
masaya
NO
7k
118370621
322
X_Devil
RU
7k
111932394
203
palmEuEi
TH
7k
111731128
23
Arona
PH
7k
110712990
61
[ M Y S T I C ]
KR
7k
109373225
141
Mafuyu87Fanboy
CN
7k
102640755
2022 年没有国人上榜,今年咱们的肝帝雪糕终于挤进全球前 10 了🎉
画个箱线图(boxplot)看看分布
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 def box_plot (data, edge_color, fill_color ): bp = ax.boxplot(data, patch_artist=True , vert=False , widths=0.4 ) for element in ['boxes' , 'whiskers' , 'fliers' , 'means' , 'medians' , 'caps' ]: plt.setp(bp[element], color=edge_color) for patch in bp['boxes' ]: patch.set (facecolor=fill_color) return bp plt.rcParams["figure.figsize" ] = (6 ,2 ) df_draw = df.sort_values('total_hits' , ascending=False , ignore_index=True ) fig, ax = plt.subplots() box_plot(df_draw['total_hits' ], 'blue' , 'cyan' ) plt.tick_params(left = False , labelleft = False ) plt.gca().xaxis.grid(True ) ax.set_title("osu!mania top #1000 total hits" ) ax.set_xlabel('Total Hits (millions)' ) plt.ticklabel_format(style='sci' , axis='x' , scilimits=(6 ,6 )) plt.show()
跟预期一样,极少数 outlier。那么再看看中位数。
1 df_draw['total_hits' ].median()
2022 年的中位数只有 2600 万,今年来到了 3300 万
接下来看看国内的击打次数前十(变动数据为手动输入)
1 2 df_draw = df.sort_values('total_hits' , ascending=False , ignore_index=True ) df_draw.loc[(df_draw['country_code' ].isin(['CN' , 'MO' , 'HK' ])), ['rank' , 'username' , 'main_keymode' , 'total_hits' ]].head(10 )
rank
username
main_keymode
total_hits
chg
141
Mafuyu87Fanboy
7k
102640755
24963372
376
Carpihat
7k
89420880
26150589
119
ExNeko
7k
78289749
17419308
63
Stink God
7k
76219518
7320228
221
Mito Van
7k
68875972
17053893
324
[GB]King Fish
7k
68470057
16767038
196
Chenut BS
7k
65536382
13833363
347
idqoos123
7k
63099663
新上榜
45
HxcQ777
7k
61966696
新上榜
38
[Crz]Reimu
7k
58388849
11944925
可以看到榜上已经没有 4k 玩家了,全是 7k 大神
时间掌控者
除了键盘敲击次数,还有另一个指标,就是总游戏时长(Total Play Time),似乎是只计算打图的时间
1 2 3 4 df_draw = df.sort_values('play_time' , ascending=False , ignore_index=True ) df_draw = df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'main_keymode' , 'play_time' ]].head(10 ) df_draw['play_time' ] = df_draw['play_time' ].apply(lambda x: str (round (x / 3600 / 24 , 2 )) + ' days' ) df_draw
rank
username
country_code
main_keymode
play_time
203
palmEuEi
TH
7k
97.75 days
141
Mafuyu87Fanboy
CN
7k
96.2 days
7
bojii
PH
7k
91.79 days
334
masaya
NO
7k
88.69 days
606
-Willow-
AU
7k
85.59 days
346
JDS20
CO
7k
84.67 days
322
X_Devil
RU
7k
83.89 days
272
hisaella
EE
7k
78.75 days
893
Axfaerie
PH
4k
77.39 days
444
-Lalito898
PE
4k
74.55 days
雪糕在 2022 年就是第二,现在还是第二,哈哈
画图代码和上面的几乎一样,就是要记得先预处理下play_time,因为爬到的数据都是秒,转换成天数会更直观。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 def box_plot (data, edge_color, fill_color ): bp = ax.boxplot(data, patch_artist=True , vert=False , widths=0.4 ) for element in ['boxes' , 'whiskers' , 'fliers' , 'means' , 'medians' , 'caps' ]: plt.setp(bp[element], color=edge_color) for patch in bp['boxes' ]: patch.set (facecolor=fill_color) return bp plt.rcParams["figure.figsize" ] = (6 ,2 ) df_draw = df.sort_values('play_time' , ascending=False , ignore_index=True ) df_draw['play_time' ] = df_draw['play_time' ].apply(lambda x: round (x / 3600 / 24 , 2 )) fig, ax = plt.subplots() box_plot(df_draw['play_time' ], 'blue' , 'cyan' ) plt.tick_params(left = False , labelleft = False ) plt.gca().xaxis.grid(True ) ax.set_title("osu!mania top #1000 total hits (2024)" ) ax.set_xlabel('Total Hits (millions)' ) plt.ticklabel_format(axis='x' ) plt.show()
中位数 25.15 天,和 2022 年相比增加了 5 天
接下来看看国榜👀
1 2 3 df_draw = df.sort_values('play_time' , ascending=False , ignore_index=True ) df_draw['play_time' ] = df_draw['play_time' ].apply(lambda x: round (x / 3600 / 24 , 2 )) df_draw.loc[(df_draw['country_code' ].isin(['CN' , 'MO' , 'HK' ])), ['rank' , 'username' , 'main_keymode' , 'play_time' ]].head(10 )
Rank
Username
Main Keymode
Play Time
141
Mafuyu87Fanboy
7k
96.2
376
Carpihat
7k
72.05
119
ExNeko
7k
60.5
221
Mito Van
7k
57.66
63
Stink God
7k
56.9
324
[GB]King Fish
7k
46.66
345
fishbone2445
7k
45.94
896
Ranm
4k
45.9
224
hisa_knowledge
7k
45.62
763
Myukee
7k
42.22
两个 boxplot 的 lower fence 都接近 0(和 2022 年的数据一样),估计是从其它游戏过来的大佬,当然也可能是挂哥,让我看看是谁
1 2 3 4 df_draw = df.sort_values('total_hits' , ascending=True , ignore_index=True ) df_draw = df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'main_keymode' , 'play_count' , 'total_hits' , 'play_time' ]].head(10 ) df_draw['play_time' ] = df_draw['play_time' ].apply(lambda x: str (round (x / 3600 / 24 , 2 )) + ' days' ) df_draw
rank
username
country_code
main_keymode
play_count
total_hits
play_time
567
My Angel Rei
SG
7k
226
563303
0.37 days
950
Twilightprncss
EE
7k
321
1016530
0.69 days
935
Sujin97
KR
7k
475
1044196
0.64 days
259
efewfa
KR
7k
351
1427974
0.8 days
364
InsCharteux
ID
7k
976
1521600
1.11 days
437
141truth
US
7k
596
1556900
0.83 days
457
UngDiKing
KR
7k
766
1740292
0.98 days
348
-K i r e i-
JP
7k
1150
1873711
1.36 days
500
LoveHanu
KR
7k
711
1906283
1.16 days
138
Li Ji Xian
KR
7k
1049
2017684
1.13 days
刷分高手
除了这几个指标,还可以看 osu 等级和 total_score
1 2 score(n) = 5,000 / 3 * (4n^3 - 3n^2 - n) + 1.25 * 1.8^(n - 60) if n <= 100 score(n) = 26,931,190,827 + 99,999,999,999 * (n - 100) if n > 100
根据官网 wiki,这俩实际上是同一个东西,而且可以看到,后期的等级非常难升,100 级之后每一级都要 1000 亿的分数。为了方便后续处理,直接看 total_score 了
1 2 3 4 df['level_real' ] = round (df['level' ] + df['level_progress' ] / 100 , 2 ) df['total_score_m' ] = df['total_score' ].astype(str ).str [:-6 ] + "M" df_draw = df.sort_values('total_score' , ascending=False , ignore_index=True ) df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'main_keymode' , 'total_score' , 'level_real' ]].head(10 )
Rank
Username
Country Code
Main Keymode
Total Score (M)
Real Level
661
Anemia
US
7k
52084M
100.25
141
Mafuyu87Fanboy
CN
7k
46424M
100.19
159
lxLucasxl
AR
7k
45382M
100.18
203
palmEuEi
TH
7k
43432M
100.16
606
-Willow-
AU
7k
41302M
100.14
7
bojii
PH
7k
41281M
100.14
334
masaya
NO
7k
39518M
100.12
322
X_Devil
RU
7k
39480M
100.12
660
araragigun
KR
7k
38072M
100.11
588
109
JP
7k
37345M
100.10
依旧是我们的万年老二雪糕,今年再加把劲,把他们都踹下来!
中位数是 10723M,也就是 107 亿的总分,还好我 138 亿😋
下面看一下国榜👀
1 2 df_draw = df.sort_values('total_score' , ascending=False , ignore_index=True ) df_draw.loc[(df_draw['country_code' ] == 'CN' ) | (df_draw['country_code' ] == 'MO' ) | (df_draw['country_code' ] == 'HK' ), ['rank' , 'username' , 'country_code' , 'main_keymode' , 'total_score_m' , 'level_real' ]].head(10 )
Rank
Username
Main Keymode
Total Score (M)
Real Level
141
Mafuyu87Fanboy
7k
46424M
100.19
376
Carpihat
7k
30691M
100.03
63
Stink God
7k
29267M
100.02
119
ExNeko
7k
27984M
100.01
347
idqoos123
7k
21411M
99.40
221
Mito Van
7k
21034M
99.36
345
fishbone2445
7k
20412M
99.29
896
Ranm
4k
19645M
99.21
339
Quotient
4k
18804M
99.11
196
Chenut BS
7k
17828M
99.01
说起来,99 级和 100 级之间也相差了 100 亿分数。。。
0x05 - 多面手
有一部分玩家是 4k 和 7k 都会玩的,那么我们来看看谁是 4k 最强的 7k 玩家
1 2 3 4 df_draw = df.copy() df_draw = df_draw.loc[df_draw['main_keymode' ] == '7k' ] df_draw = df_draw.sort_values('pp_4k' , ascending=False , ignore_index=True ) df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'pp_4k' , 'pp_7k' ]].head(10 )
rank
username
country_code
pp_4k
pp_7k
7
bojii
PH
17349
25173
55
yeonho7028
KR
15593
20232
28
SillyFangirl
BR
15525
22046
79
instal
TH
15207
19439
211
gaesol
KR
14989
15247
89
grillroasted
CZ
14814
18700
9
Stellium
KR
14602
24963
182
[LS]bambi fnf
CL
14538
15420
49
NkeyZoyDkqehKal
KR
14372
20881
5
Kalkai
KR
14239
25761
7k 最强的 4k 玩家(代码一样就不贴了)
rank
username
country_code
pp_4k
pp_7k
183
Orost
BR
15227
14694
254
xxxxxx2800
MY
14395
14358
267
jhleetgirl
JP
14517
14094
236
Poca
KR
15080
13708
219
imyeeyee
KR
15571
13513
489
Minwoo3098
US
12707
11970
509
EstaticStatisIO
ID
12653
11907
541
Focoo
AR
12516
11616
498
pboo2424
TH
13028
11567
550
Rei_Insana300
EC
12421
11474
0x06 - 众目睽睽
终于到了雪糕最爱的环节,回放次数大比拼!
1 2 df_draw = df.sort_values('replays_count' , ascending=False , ignore_index=True ) df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'main_keymode' , 'replays_count' ]].head(10 )
rank
username
country_code
main_keymode
replays_count
99
OutLast
KR
7k
382049
246
Majesty
KR
7k
293656
88
myucchii
CL
4k
291103
7
bojii
PH
7k
288776
299
gosy777
KR
7k
201678
753
Lothus
BR
7k
185352
151
Estonians
KR
7k
158569
28
SillyFangirl
BR
7k
135702
19
ideu-
KR
7k
131539
458
Cobo-
KR
7k
109118
很多古神(inteliser,0133等)因为退坑的缘故,所以没法展示出来,比较遗憾。即便如此,榜上大佬们的含金量还是很高的。
画个箱线图看看分布(这里用ln处理了下,要不然这图全都挤到左边,完全没法看)
1 2 3 4 5 6 7 8 9 10 df_draw = df.sort_values('replays_count' , ascending=False , ignore_index=True ) df_draw['replays_count' ] = df_draw['replays_count' ].apply(lambda x: np.log(x)) fig, ax = plt.subplots() box_plot(df_draw['replays_count' ], 'blue' , 'cyan' ) plt.tick_params(left = False , labelleft = False ) plt.gca().xaxis.grid(True ) ax.set_title("osu!mania top #1000 replay count (2024)" ) ax.set_xlabel('Replay Count (e^x)' ) plt.show() df_draw['replays_count' ].median()
中位数是 121.5,唉,我的回放数过了这么久一点都没涨,还是只有 14😭
看看国榜有没有雪糕👀
1 2 df_draw = df.sort_values('replays_count' , ascending=False , ignore_index=True ) df_draw.loc[(df_draw['country_code' ].isin(['CN' , 'MO' , 'HK' ])), ['rank' , 'username' , 'main_keymode' , 'replays_count' ]].head(10 )
Rank
Username
Main Keymode
Replays Count
486
DawnX
4k
46686
63
Stink God
7k
43308
66
LiangIaiajan
7k
14711
34
[Crz]Satori
7k
6785
172
Krn_
7k
5283
339
Quotient
4k
4670
12
tyrcs
7k
4621
43
VanWilder
7k
3593
369
[Crz]Nickname
4k
3447
43
QingJiDing
7k
3145
哈哈,没有,雪糕才 2000 多回放,上不了榜🤡
0x07 - 有朋自远方来,来不动了
osu的好友是单向关注的(绿色),如果互关了(mutual)就会变成粉色。这里统计的是个人主页显示的好友数量,也就是有多少人关注了你。
1 2 df_draw = df.sort_values('friend_count' , ascending=False , ignore_index=True ) df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'main_keymode' , 'friend_count' ]].head(10 )
rank
username
country_code
main_keymode
friend_count
28
SillyFangirl
BR
7k
9589
88
myucchii
CL
4k
5145
7
bojii
PH
7k
4703
758
Andere
CL
7k
4012
1
dressurf
KR
7k
3166
479
Eliminate
GB
4k
2565
41
Motion
KR
7k
2505
317
arcwinolivirus
PH
7k
2351
395
jkzu123
DE
4k
2262
474
CrewK
JP
4k
2076
画个箱线图看看(这里同样对数据进行了处理,否则图像会全部集中到左边)
1 2 3 4 5 6 7 8 9 10 df_draw = df.sort_values('friend_count' , ascending=False , ignore_index=True ) df_draw['friend_count' ] = df_draw['friend_count' ].apply(lambda x: np.log(x)) fig, ax = plt.subplots() box_plot(df_draw['friend_count' ], 'blue' , 'cyan' ) plt.tick_params(left = False , labelleft = False ) plt.gca().xaxis.grid(True ) ax.set_title("osu!mania top #1000 friend count (2024)" ) ax.set_xlabel('Friend Count (e^x)' ) plt.show() df_draw['friend_count' ].median()
好友中位数 118,看来我拖后腿了
瞅瞅国榜👀
Rank
Username
Main Keymode
Friend Count
12
tyrcs
7k
878
119
ExNeko
7k
834
369
[Crz]Nickname
4k
700
486
DawnX
4k
631
135
AWMRone
7k
630
66
LiangIaiajan
7k
573
579
lovely_hyahya
7k
572
470
[Crz]Rachel
7k
505
923
[Crz]Alleyne
4k
488
376
Carpihat
7k
487
没想到电子宠物大坏猫居然是交际花
0x08 - 自古以来
稍微看看现役的大佬们都是什么时候注册账号入坑的👀
1 2 3 4 5 6 7 8 9 10 11 12 13 df_draw = df.groupby('join_year' ).size().to_frame(name='count' ).reset_index() df_draw = df_draw.sort_values('join_year' , ascending=True ) df_draw['join_year' ] = df_draw['join_year' ].astype(str ).str [2 :] fig, ax = plt.subplots() bars = ax.bar(df_draw['join_year' ], df_draw['count' ]) ax.bar_label(bars) ax.set_title("osu!mania top #1000 player join year (2024)" ) ax.set_xlabel('Year' ) ax.set_ylabel('Player Count' ) plt.show()
有点好奇那 4 个 09 年注册,一直活跃到现在的大神是谁
Rank
Username
Country Code
Main Keymode
Performance
19
ideu-
KR
7k
23290.3
43
VanWilder
CN
7k
21367.5
396
turtlewing
KR
7k
14670.3
451
inuyashasama
KR
7k
14359.6
翻身!!!说起来翻身今年就要 35 岁了,依然在代表中国参加 MWC 7K 2024,宝刀未老!
0x09 - 警惕判比
说到判比,就想到 SS 的数量。因为这里只爬取了现役前 1000 名玩家的数据,所以藏比就抓不出来了。
雪糕天天在群里刷儿歌打判定,看看全球榜有没有雪糕👀
1 2 3 df['SS_total' ] = df['SS_count' ] + df['SSH_count' ] df_draw = df.sort_values('SS_total' , ascending=False , ignore_index=True ) df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'main_keymode' , 'SS_total' ]].head(10 )
rank
username
country_code
main_keymode
SS_total
99
OutLast
KR
7k
6923
104
lnote_
KR
7k
6746
159
lxLucasxl
AR
7k
5899
606
-Willow-
AU
7k
4968
661
Anemia
US
7k
4490
320
robby250
RO
7k
4109
7
bojii
PH
7k
3714
593
Miku Meru
BR
4k
2857
798
Exlude
EE
7k
2798
299
gosy777
KR
7k
2789
哈哈,怎么回事呢,那个男人微笑去哪里了呢🤡
不会在国榜吧?
1 2 df_draw = df.sort_values('SS_total' , ascending=False , ignore_index=True ) df_draw.loc[(df_draw['country_code' ].isin(['CN' , 'MO' , 'HK' ])), ['rank' , 'username' , 'main_keymode' , 'SS_total' ]].head(10 )
Rank
Username
Main Keymode
SS Total
141
Mafuyu87Fanboy
7k
2699
63
Stink God
7k
2203
781
Mihyo_San
7k
1798
369
[Crz]Nickname
4k
922
339
Quotient
4k
916
119
ExNeko
7k
859
763
Myukee
7k
757
896
Ranm
4k
714
376
Carpihat
7k
625
535
lucky icons
7k
545
还真在国榜,领先臭神将近 500 个 SS 😨
那么 acc 最高的判比们又有哪些呢?
1 2 df_draw = df.sort_values('accuracy' , ascending=False , ignore_index=True ) df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'main_keymode' , 'accuracy' ]].head(10 )
Rank
Username
Country Code
Main Keymode
Accuracy
641
[LS]Tenshi
PH
7k
99.6653
976
diviza
PE
7k
99.4974
765
Luna I guess
US
4k
99.1412
434
lyvet
PH
4k
99.0993
935
Sujin97
KR
7k
99.0888
737
Hualow
ID
4k
98.9990
554
[GB]SuddenDeath
KR
4k
98.9655
389
[Albert]
ID
4k
98.8328
625
Fieri
ID
4k
98.8294
407
Hello_Son
US
4k
98.8161
接下来按照惯例,来看看国产大判比
1 2 df_draw = df.sort_values('accuracy' , ascending=False , ignore_index=True ) df_draw.loc[(df_draw['country_code' ].isin(['CN' , 'MO' , 'HK' ])), ['rank' , 'username' , 'main_keymode' , 'accuracy' ]].head(10 )
rank
username
main_keymode
accuracy
339
Quotient
4k
98.8076
414
[GB]nyasun
4k
98.7232
708
racksack
4k
98.5903
611
[Paw]Just_MLN
4k
98.5578
592
[GB]ParasolTree
4k
98.5175
877
StarTemplar
4k
98.4896
913
Squis1037
4k
98.4798
718
ATP Koshepen
4k
98.3490
815
[Crz]Caicium
4k
98.3145
424
neeko the rock
4k
98.2423
最后看看需要警惕哪些 7k 国产大判比
1 2 df_draw = df.sort_values('accuracy' , ascending=False , ignore_index=True ) df_draw.loc[(df_draw['main_keymode' ] == '7k' ) & (df_draw['pp_4k' ] < 9000 ) & (df_draw['country_code' ].isin(['CN' , 'MO' , 'HK' ])), ['rank' , 'username' , 'accuracy' ]].head(20 )
Rank
Username
Accuracy
34
[Crz]Satori
98.0678
43
QingJiDing
97.8572
43
VanWilder
97.8104
172
Krn_
97.7941
202
U1d
97.7313
141
Mafuyu87Fanboy
97.6430
750
tangjinxi
97.6200
153
- Minato Aqua -
97.6176
22
af-
97.6103
66
LiangIaiajan
97.5192
119
ExNeko
97.4939
196
Chenut BS
97.3203
260
10086kfry
97.3195
763
Myukee
97.3040
781
Mihyo_San
97.2728
853
jhlee0I33
97.2546
347
idqoos123
97.2320
201
Mi-a
97.2275
171
Longe
97.2046
180
RiskyMonster272
97.1972
0x10 - 我是大漏勺
这次发现 ppy 还提供了count_miss,应该是mania生涯的 miss 数,让我看看谁是大漏勺
1 2 df_draw = df.sort_values('count_miss' , ascending=False , ignore_index=True ) df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'main_keymode' , 'count_miss' ]].head(10 )
rank
username
country_code
main_keymode
count_miss
325
AdamYuan
CN
7k
2749684
346
JDS20
CO
7k
2524503
809
Min-
KR
7k
2449145
9
Stellium
KR
7k
2400575
756
StevenS
EC
7k
2341577
301
do you fart
NZ
7k
2256312
376
Carpihat
CN
7k
2253024
23
Arona
PH
7k
2237571
344
Stoom
DK
7k
2102566
133
invadey
US
4k
2012134
但是这个排行算法有个问题,因为 miss 数量跟总击打数(total_hits)是正相关的,所以可能计算 miss 的比例会更合适
1 2 3 df['miss_ratio' ] = round (df['count_miss' ] / df['total_hits' ], 4 ) df_draw = df.sort_values('miss_ratio' , ascending=False , ignore_index=True ) df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'main_keymode' , 'miss_ratio' ]].head(10 )
rank
username
country_code
main_keymode
miss_ratio
325
AdamYuan
CN
7k
0.0824
357
Cattlea
JP
7k
0.0545
231
SoftC418
CN
7k
0.0369
866
THIS A PERSON
US
7k
0.0364
301
do you fart
NZ
7k
0.0364
524
imstupidfor7k
SG
7k
0.0344
258
Shepped
CL
7k
0.0343
917
WoodKliz
PA
7k
0.0337
495
aceqwer370
KR
7k
0.0333
636
DannyXLee
CN
7k
0.0324
呃呃,咱们中国的 AdamYuan 一骑绝尘,化身金牌大漏勺,遥遥领先其它玩家
接下来按照惯例,来看看国产大漏勺
1 2 df_draw = df.sort_values('count_miss' , ascending=False , ignore_index=True ) df_draw.loc[(df_draw['country_code' ].isin(['CN' , 'MO' , 'HK' ])), ['rank' , 'username' , 'main_keymode' , 'count_miss' ]].head(10 )
rank
username
main_keymode
count_miss
325
AdamYuan
7k
2749684
376
Carpihat
7k
2253024
24
SoftC418
7k
1680528
58
Mafuyu87Fanboy
7k
1357278
62
Mito Van
7k
1310800
79
[GB]King Fish
7k
1242739
92
Yozomi
7k
1213444
113
shiyu1213
7k
1128582
118
[GB]Burger King
7k
1093815
133
kanasshi
7k
1060956
1 2 df_draw = df.sort_values('miss_ratio' , ascending=False , ignore_index=True ) df_draw.loc[(df_draw['country_code' ].isin(['CN' , 'MO' , 'HK' ])), ['rank' , 'username' , 'main_keymode' , 'miss_ratio' ]].head(10 )
rank
username
main_keymode
miss_ratio
325
AdamYuan
7k
0.0824
231
SoftC418
7k
0.0369
636
DannyXLee
7k
0.0324
934
kanasshi
7k
0.0316
746
Tat3
7k
0.0315
624
[Crz]Zetsfy
7k
0.0285
547
shiyu1213
7k
0.0277
585
ToukiM
7k
0.0273
626
Croatian songs
4k
0.0269
814
_Reimu
7k
0.0266
0x11 - 警惕连比
说完漏勺,自然也不能不说一下连比。MCNC 7K 2024 Semifinals,张帆对 af 的那局,翻身愣是把好几个吊图给连上了。只要我不掉,对面自己会掉🥵
不过需要注意的是,当前版本仍然是 ScoreV1,面条特别多的图(尤其较低的ln 段位或者放手)能够刷出极高的连击数
1 2 df_draw = df.sort_values('maximum_combo' , ascending=False , ignore_index=True ) df_draw.loc[:, ['rank' , 'username' , 'country_code' , 'main_keymode' , 'maximum_combo' ]].head(10 )
rank
username
country_code
main_keymode
maximum_combo
893
Axfaerie
PH
4k
55637
425
Plana_
PH
4k
55554
858
BossPlays
AR
4k
55538
538
ERA Dev
US
4k
55509
208
Plutes
MX
7k
55506
846
Loslic
KR
4k
55502
856
nayeonie bunny
BR
4k
55501
39
lupesco
MX
7k
55495
375
SnowScent
KR
7k
55486
88
myucchii
CL
4k
55457
50000 combo,基本可以确定是这张图:Between the Buried and Me - The Parallax II: Future Sequence 。这张 loved 图长达 1 小时 12 分钟,只能说都是狠人。。。
接下来看看需要警惕哪些国产连比
1 2 df_draw = df.sort_values('maximum_combo' , ascending=False , ignore_index=True ) df_draw.loc[(df_draw['country_code' ].isin(['CN' , 'MO' , 'HK' ])), ['rank' , 'username' , 'main_keymode' , 'maximum_combo' ]].head(10 )
Rank
Username
Main Keymode
Maximum Combo
369
[Crz]Nickname
4k
36319
376
Carpihat
7k
30957
93
[Paw]FIood
7k
27078
592
[GB]ParasolTree
4k
23349
224
hisa_knowledge
7k
22391
66
LiangIaiajan
7k
20552
339
Quotient
4k
19014
611
[Paw]Just_MLN
4k
18452
909
[Crz]Xinyi2016
4k
17801
424
neeko the rock
4k
17635
众所周知 7k 更容易漏勺,所以看看需要警惕哪些国产 7k 大连比
1 2 3 df_draw = df.sort_values('maximum_combo' , ascending=False , ignore_index=True ) df_draw.loc[(df_draw['main_keymode' ] == '7k' ) & (df_draw['pp_4k' ] < 9000 ) & (df_draw['country_code' ].isin(['CN' , 'MO' , 'HK' ])) | (df_draw['username' ] == 'SilentParleHorn' ), ['rank' , 'username' , 'maximum_combo' ]].head(30 )
rank
username
maximum_combo
66
LiangIaiajan
20552
345
fishbone2445
16334
34
[Crz]Satori
16308
141
Mafuyu87Fanboy
15990
171
Longe
14455
196
Chenut BS
14249
172
Krn_
13754
201
SilentParleHorn
13640
744
O2jam Ultima
13556
347
idqoos123
13290
43
VanWilder
12920
221
Mito Van
12773
153
- Minato Aqua -
11891
481
quailty
10991
135
_Yiiiii
10742
275
pipisugar
9915
490
ICDYO
9650
763
Myukee
9195
142
Watch01
9103
717
Yozomi
8636
22
af-
8566
63
Stink God
8414
119
ExNeko
8369
260
10086kfry
8081
989
jackyuanchen
8066
202
U1d
7979
43
QingJiDing
7859
274
KafuuChino
7735
470
[Crz]Rachel
7723
198
[GB]hej_067
7635
为什么连比名单这么长呢?因为打比赛要警惕连比!!
0x12 - 后记
其实也没啥好写的了,如果有人需要数据集可以问我要。分析仅供参考,如有疑问,那就有疑问吧,本来也就随手写着玩。