Aug 1, 2024
Modified Aug 7, 2024
Hello Logic
Introducing Cyber Logic: a comprehensive collection of questions, answers, and explanations aimed at fostering a deep understanding of cybersecurity, networking, and IT support.
Dataset Summary
Cyber-Logic
is a synthetically generated dataset for supervised fine-tuning using the new Mistral-NeMo
model, together with other Mistral models like Mixtral-8x7b
and Mixtral-8x22b
.
The dataset contains questions and explanations for various tech-based tasks like cybersecurity, networking, and IT support. All explanations in this dataset are generated in the COT (Chain of Thought) style.
Data Generation
The dataset was generated using a single 4x4090 machine:
Generating the question-answer pairs took ~ 482 hours
Generating the question-explanation pairs took ~ 40 hours.
Computing the embeddings, assessing quality, and classifying the questions into the three categories ~ 38 hours.
Data Structure
The data has the following structure:
id
: The ID of the prompt. This is for error tracing and data validation. The prompt ID is structured in the following format:dataset:UUID
.messages
: The generated instruct prompt and explanation are stored in an OpenAI-compatible dictionary structure.prompt
: The generated instruct prompt in the messages array.
The Dataset
Download Dataset
HuggingFace
: In the interest of free and opensource development of AI models, the entire dataset has been published to HuggingFace free of charge under theApache 2.0 license
.GitHub
: For those studying for Network+ or Security+, the questions, answers, and explanations have been published in markdown on GitHub. If you have any questions or problems related to this dataset's questions or explanations, don’t hesitate to contact me.