【专家简介】:张子柯,教授,博士生导师,浙江大学数字沟通研究中心副主任,浙江大学人工智能通识教育课程教材建设组副组长。主要研究兴趣为计算驱动的复杂社会系统。主持国家自然科学基金、国家自然科学基金重大项目子课题、国家社科重大项目课题、教育部人文社科重点研究基地重大项目课题、欧盟第七科技框架项目等课题。荣获中国计算机协会自然科学二等奖(一等奖空缺)等。近年来入选浙江省优秀教师、浙江省师德先进个人等。兼任复杂性科学研究会秘书长、中国新闻史学会智能与计算传播专委会常务理事等。
【报告摘要】:The structure of data organization is widely recognized as having a substantial influence on the efficacy of machine learning algorithms, particularly in binary classification tasks. Our research provides a theoretical framework suggesting that the maximum potential of binary classifiers on a given dataset is primarily constrained by the inherent qualities of the data. Through both theoretical reasoning and empirical examination, we employed standard objective functions, evaluative metrics, and binary classifiers to arrive at two principal conclusions. Firstly, we show that the theoretical upper bound of binary classification performance on actual datasets can be theoretically attained. This upper boundary represents a calculable equilibrium between the learning loss and the metric of evaluation. Secondly, we have computed the precise upper bounds for three commonly used evaluation metrics, uncovering a fundamental uniformity with our overarching thesis: the upper bound is intricately linked to the dataset's characteristics, independent of the classifier in use. Additionally, our subsequent analysis uncovers a detailed relationship between the upper limit of performance and the level of class overlap within the binary classification data. This relationship is instrumental for pinpointing the most effective feature subsets for use in feature engineering. This work is generally has potential applications in data driven researches to quantitively evaluate the dilemma of promoting algorithm performance and improving data quality.
【报告时间】:2025年11月29日09:30-10:30
【报告地点】:位育楼417

