Genomics & Informatics

Table 4 Precision, recall, and F1 metrics for entities present in the formatted outputs of the LLMs, using the second evaluation strategy and expanded by entity category

From: Comparative analysis of generative LLMs for labeling entities in clinical notes

		Variation 2			Variation 3
Model	Category	P	R	F1	P	R	F1
llama-2-7b	DISEASE	0.500	0.008	0.015	0.333	0.003	0.005
	PROCEDURE	0.000	0.000	0.000	0.500	0.001	0.003
	SYMPTOM	0.667	0.004	0.008	0.000	0.000	0.000
	micro avg	0.556	0.003	0.006	0.222	0.001	0.003
llama-2-7b-chat	DISEASE	0.349	0.055	0.096	0.500	0.040	0.075
	PROCEDURE	0.000	0.000	0.000	0.323	0.015	0.028
	SYMPTOM	0.494	0.080	0.138	0.472	0.033	0.062
	micro avg	0.432	0.040	0.073	0.434	0.027	0.051
codellama-7b-instruct	DISEASE	0.413	0.096	0.155	0.000	0.000	0.000
	PROCEDURE	0.000	0.000	0.000	0.000	0.000	0.000
	SYMPTOM	0.560	0.210	0.305	0.000	0.000	0.000
	micro avg	0.512	0.091	0.155	0.000	0.000	0.000
mistral-7b-v0.1	DISEASE	0.167	0.003	0.005	0.462	0.015	0.029
	PROCEDURE	0.667	0.003	0.006	0.250	0.001	0.003
	SYMPTOM	0.462	0.012	0.023	0.400	0.004	0.008
	micro avg	0.409	0.006	0.011	0.409	0.006	0.011
mistral-7b-instruct-v0.2	DISEASE	0.467	0.141	0.217	0.571	0.101	0.171
	PROCEDURE	0.000	0.000	0.000	0.421	0.012	0.023
	SYMPTOM	0.517	0.208	0.297	0.390	0.080	0.133
	micro avg	0.498	0.102	0.170	0.459	0.056	0.100
mixtral-8x7b-instruct-v0.1	DISEASE	0.463	0.189	0.268	0.372	0.146	0.210
	PROCEDURE	0.429	0.031	0.058	0.319	0.087	0.137
	SYMPTOM	0.434	0.237	0.307	0.398	0.157	0.225
	micro avg	0.443	0.137	0.209	0.363	0.124	0.185

The highest F1-scores for each prompt variation are highlighted in the table

Back to article page

ISSN: 2234-0742

Contact us

General enquiries: info@biomedcentral.com