Genomics & Informatics

Table 3 Precision, recall, and F1 metrics for entities identified in the outputs of the LLMs, using the first evaluation strategy

From: Comparative analysis of generative LLMs for labeling entities in clinical notes

	Variation 1			Variation 2			Variation 3
Model	P	R	F	P	R	F	P	R	F
llama-2-7b	0.986	0.084	0.156	0.907	0.023	0.046	0.967	0.052	0.099
llama-2-7b-chat	0.971	0.182	0.306	0.952	0.191	0.318	0.957	0.145	0.252
codellama-7b-instruct	0.971	0.383	0.549	0.961	0.220	0.358	0.967	0.191	0.318
mistral-7b-v0.1	0.960	0.043	0.083	0.951	0.046	0.088	0.941	0.038	0.074
mistral-7b-instruct-v0.2	0.966	0.222	0.361	0.955	0.267	0.418	0.929	0.118	0.209
mixtral-8x7b-instruct-v0.1	0.974	0.428	0.595	0.974	0.365	0.531	0.958	0.193	0.321

The highest F1-scores for each prompt variation are highlighted in the table

Back to article page

ISSN: 2234-0742

Contact us

General enquiries: info@biomedcentral.com