Scenario: tests/test_adaptive_prompts.py::TestComplexityEstimation::test_complex_test_high_complexity
Why Needed: This test is needed because it checks for complexity estimation in tests that have multiple assertions and mocks.
Confidence: 80%
Tokens: 118 input + 103 output = 221 total
Scenario: tests/test_adaptive_prompts.py::TestComplexityEstimation::test_empty_source_zero_complexity
Why Needed: The test is necessary because it checks the behavior of the `Config` class when given an empty source.
Confidence: 80%
Tokens: 136 input + 163 output = 299 total
Scenario: tests/test_adaptive_prompts.py::TestComplexityEstimation::test_simple_test_low_complexity
Why Needed: This test is needed because it checks if Simple tests have low complexity scores.
Confidence: 80%
Tokens: 115 input + 79 output = 194 total
Scenario: Test invalid prompt tier
Why Needed: To test that the 'invalid' prompt tier is considered an error during validation.
Confidence: 80%
Tokens: 126 input + 110 output = 236 total
Scenario: Valid prompt tiers are validated
Why Needed: To ensure that the `prompt_tier` parameter is correctly validated and does not cause any issues.
Confidence: 80%
Tokens: 142 input + 84 output = 226 total
Scenario: tests/test_adaptive_prompts.py::TestPromptTierSelection::test_auto_tier_complex_test
Why Needed: Auto mode should use standard prompt for complex tests.
Confidence: 80%
Tokens: 122 input + 80 output = 202 total
Scenario: tests/test_adaptive_prompts.py::TestPromptTierSelection::test_auto_tier_simple_test
Why Needed: To ensure that the auto mode selects a minimal prompt for simple tests, which is essential for efficient and effective test execution.
Confidence: 80%
Tokens: 155 input + 91 output = 246 total
Scenario: tests/test_adaptive_prompts.py::TestPromptTierSelection::test_minimal_tier_override
Why Needed: Config override to minimal should always use minimal prompt.
Confidence: 80%
Tokens: 122 input + 91 output = 213 total
Scenario: Config override to standard should always use standard prompt.
Why Needed: Because the config override is not necessary in this case, and using the standard prompt is a better practice.
Confidence: 80%
Tokens: 148 input + 77 output = 225 total
Scenario: Test that the aggregate function correctly handles all policy when aggregating multiple test cases
Why Needed: This test prevents regression where an aggregation of multiple tests results in only one test being retained due to a missing or incomplete policy.
Confidence: 80%
Tokens: 364 input + 180 output = 544 total
Scenario: tests/test_aggregation.py::TestAggregator::test_aggregate_dir_not_exists
Why Needed: The test is failing because the `aggregate` method of the Aggregator class does not check if the aggregation directory exists before aggregating data.
Confidence: 80%
Tokens: 104 input + 109 output = 213 total
Scenario: Test that the `aggregate` method consistently picks the latest policy for a given test case across different times.
Why Needed: This test prevents regression where the latest policy is not picked correctly due to inconsistent timing of reports.
Confidence: 80%
Tokens: 477 input + 205 output = 682 total
Scenario: tests/test_aggregation.py::TestAggregator::test_aggregate_no_dir_configured
Why Needed: The test is necessary because the aggregator requires a directory to aggregate data from.
Confidence: 80%
Tokens: 110 input + 84 output = 194 total
Scenario: The `aggregate` method of the Aggregator class should not be called when there are no reports to aggregate.
Why Needed: This test prevents a potential bug where the `aggregate` method is called on an empty list of reports, causing it to return `None` instead of raising an error.
Confidence: 80%
Tokens: 201 input + 142 output = 343 total
Scenario: Test that coverage and LLM annotations are properly deserialized and can be re-serialized.
Why Needed: Prevents regression in core functionality by ensuring accurate token usage and coverage information is preserved during serialization.
Confidence: 80%
Tokens: 1002 input + 121 output = 1123 total
Scenario: Test source coverage summary deserialization when aggregate is configured.
Why Needed: This test prevents a potential bug where the source coverage summary is not correctly deserialized if the aggregate configuration does not match the report format.
Confidence: 80%
Tokens: 395 input + 158 output = 553 total
Scenario: Test loading coverage from configured source file when option is not set.
Why Needed: To prevent a potential bug where the test fails due to an incorrect assumption about the configuration.
Confidence: 80%
Tokens: 584 input + 189 output = 773 total
Scenario: test_recalculate_summary verifies that the aggregator recalculates the latest summary correctly and preserves coverage percentage.
Why Needed: This test prevents regression in the aggregation logic, ensuring that the latest summary is calculated accurately and coverage percentage remains preserved.
Confidence: 80%
Tokens: 473 input + 198 output = 671 total
Scenario: Test that skipping an invalid JSON report prevents the aggregation from counting it as a valid report.
Why Needed: This test ensures that the `aggregate` function correctly handles reports with missing or malformed data, preventing it from incorrectly counting them as valid.
Confidence: 80%
Tokens: 352 input + 182 output = 534 total
Scenario: The test verifies that the aggregator recalculates the summary correctly when new tests are added and the latest summary is updated.
Why Needed: This test prevents regression in coverage calculation when new tests are added to the aggregation process, ensuring accuracy of the overall coverage percentage.
Confidence: 80%
Tokens: 299 input + 125 output = 424 total
Scenario: tests/test_annotator.py::TestAnnotateTests::test_batch_optimization_message
Why Needed: To test the batch optimization message functionality.
Confidence: 80%
Tokens: 112 input + 97 output = 209 total
Scenario: tests/test_annotator.py::TestAnnotateTests::test_cached_progress_reporting
Why Needed: To ensure that the progress reporting is cached correctly and not overwritten by subsequent requests.
Confidence: 80%
Tokens: 101 input + 163 output = 264 total
Scenario: tests/test_annotator.py::TestAnnotateTests::test_cached_tests_are_skipped
Why Needed: The test is necessary because it checks if cached tests are skipped. This is a critical functionality for ensuring the reliability and efficiency of the annotator.
Confidence: 80%
Tokens: 102 input + 134 output = 236 total
Scenario: tests/test_annotator.py::TestAnnotateTests::test_concurrent_annotation
Why Needed: To ensure that annotators can annotate data in a concurrent manner without causing any issues.
Confidence: 80%
Tokens: 98 input + 139 output = 237 total
Scenario: tests/test_annotator.py::TestAnnotateTests::test_concurrent_annotation_handles_failures
Why Needed: This test is needed because the current implementation of `annotate` may not handle failures correctly when multiple annotators are working concurrently.
Confidence: 80%
Tokens: 116 input + 128 output = 244 total
Scenario: tests/test_annotator.py::TestAnnotateTests::test_progress_reporting
Why Needed: To ensure that the annotator reports progress accurately and consistently throughout the annotation process.
Confidence: 80%
Tokens: 96 input + 134 output = 230 total
Scenario: tests/test_annotator.py::TestAnnotateTests::test_reports_progress_messages
Why Needed: To ensure that the progress messages are returned correctly when annotating reports.
Confidence: 80%
Tokens: 101 input + 146 output = 247 total
Scenario: tests/test_annotator.py
Why Needed: The test is necessary to ensure that the annotator respects the opt-out and limit settings.
Confidence: 80%
Tokens: 104 input + 102 output = 206 total
Scenario: tests/test_annotator.py::TestAnnotateTests::test_respects_rate_limit
Why Needed: The test respects the rate limit for annotating a single document.
Confidence: 80%
Tokens: 112 input + 113 output = 225 total
Scenario: Sequential annotation
Why Needed: To ensure that the annotator can correctly annotate sequential data.
Confidence: 80%
Tokens: 98 input + 122 output = 220 total
Scenario: tests/test_annotator.py::TestAnnotateTests::test_sequential_annotation_error_tracking
Why Needed: Error tracking in sequential annotation is necessary to ensure that errors are properly reported and handled by the annotator.
Confidence: 80%
Tokens: 105 input + 159 output = 264 total
Scenario: tests/test_annotator.py::TestAnnotateTests::test_skips_if_disabled
Why Needed: The test should be skipped when the LLM (Large Language Model) is not enabled.
Confidence: 80%
Tokens: 108 input + 81 output = 189 total
Scenario: tests/test_annotator.py::TestAnnotateTests::test_skips_if_provider_unavailable
Why Needed: The test is skipped if the provider is unavailable because it prevents the test from running and potentially causing data loss.
Confidence: 80%
Tokens: 101 input + 90 output = 191 total
Scenario: Test Base Parse Response Malformed JSON After Extract
Why Needed: To ensure that the `_parse_response` method correctly handles malformed JSON responses and raises a `JSONDecodeError` with a meaningful error message.
Confidence: 80%
Tokens: 152 input + 114 output = 266 total
Scenario: Verify that the `test_base_parse_response_non_string_fields` test case checks for non-string fields in the response data.
Why Needed: This test prevents a potential bug where the function does not handle cases with non-string fields in its response data.
Confidence: 80%
Tokens: 269 input + 103 output = 372 total
Scenario: tests/test_base_maximal.py::TestGetProvider::test_get_gemini_provider
Why Needed: To ensure the `get_gemini_provider` function is correctly creating a `GeminiProvider` instance.
Confidence: 80%
Tokens: 104 input + 83 output = 187 total
Scenario: tests/test_base_maximal.py::TestGetProvider::test_get_invalid_provider
Why Needed: To ensure that a ValueError is raised when an unknown LLM provider is specified.
Confidence: 80%
Tokens: 106 input + 80 output = 186 total
Scenario: tests/test_base_maximal.py::TestGetProvider::test_get_litellm_provider
Why Needed: To ensure the `get_litellm_provider` function returns a valid instance of `LiteLLMProvider`.
Confidence: 80%
Tokens: 109 input + 86 output = 195 total
Scenario: tests/test_base_maximal.py::TestGetProvider::test_get_noop_provider
Why Needed: The test is necessary to ensure that the `get_provider` function returns a NoopProvider instance when no provider is specified.
Confidence: 80%
Tokens: 104 input + 87 output = 191 total
Scenario: tests/test_base_maximal.py::TestGetProvider::test_get_ollama_provider
Why Needed: To verify the correctness of the OllamaProvider class.
Confidence: 80%
Tokens: 108 input + 89 output = 197 total
Scenario: Verify that the LlmProvider can be checked for availability without raising an exception.
Why Needed: This test prevents a potential bug where the LlmProvider raises an exception when checking its availability, causing the test to fail.
Confidence: 80%
Tokens: 280 input + 99 output = 379 total
Scenario: tests/test_base_maximal.py::TestLlmProviderDefaults
Why Needed: To ensure that the LLM model name is set to the default configuration when no explicit model name is provided.
Confidence: 80%
Tokens: 114 input + 86 output = 200 total
Scenario: tests/test_base_maximal.py
Why Needed: This test ensures that the `get_rate_limits` method returns `None` when no default rate limits are specified.
Confidence: 80%
Tokens: 108 input + 87 output = 195 total
Scenario: tests/test_base_maximal.py
Why Needed: To ensure the LLM provider defaults to false when in local mode.
Confidence: 80%
Tokens: 105 input + 68 output = 173 total
Scenario: Verify that context files are included in the batch prompt.
Why Needed: This test prevents a potential issue where context files are not added to the prompt, potentially leading to incorrect usage of the `build_batch_prompt` function.
Confidence: 80%
Tokens: 261 input + 108 output = 369 total
Scenario: Verifies that the parametrized batch prompt includes all required information.
Why Needed: This test prevents a potential regression where the prompt is missing or incorrect, making it difficult to understand and use the `build_batch_prompt` function.
Confidence: 80%
Tokens: 330 input + 107 output = 437 total
Scenario: Single test should generate normal prompt.
Why Needed: This test prevents a regression where the batched prompt is missing 'Test: test.py::test_foo' and incorrectly includes 'Parameterizations'.
Confidence: 80%
Tokens: 269 input + 108 output = 377 total
Scenario: Verify that the same source code produces the same hash value.
Why Needed: Prevents a potential bug where different versions of the test function produce different hashes, leading to inconsistent results.
Confidence: 80%
Tokens: 220 input + 181 output = 401 total
Scenario: tests/test_batching.py::TestComputeSourceHash::test_different_source_different_hash
Why Needed: To ensure that different source code produces different hashes, which is a requirement for batch processing to work correctly.
Confidence: 80%
Tokens: 127 input + 85 output = 212 total
Scenario: tests/test_batching.py::TestComputeSourceHash::test_empty_source
Why Needed: The current implementation of ComputeSourceHash does not handle an empty source correctly.
Confidence: 80%
Tokens: 94 input + 93 output = 187 total
Scenario: tests/test_batching.py::TestConfigValidation::test_batch_max_tests_minimum
Why Needed: The test is necessary because the `batch_max_tests` configuration option must be at least 1.
Confidence: 80%
Tokens: 126 input + 103 output = 229 total
Scenario: tests/test_batching.py::TestConfigValidation::test_context_line_padding_non_negative
Why Needed: Context line padding must be non-negative.
Confidence: 80%
Tokens: 126 input + 79 output = 205 total
Scenario: Test invalid context compression
Why Needed: To test that an invalid context compression mode raises an error during validation.
Confidence: 80%
Tokens: 122 input + 72 output = 194 total
Scenario: TestConfigValidation
Why Needed: Valid compression modes should pass.
Confidence: 80%
Tokens: 133 input + 75 output = 208 total
Scenario: tests/test_batching.py::TestGetBaseNodeid::test_nested_params
Why Needed: This test is necessary because the current implementation of _get_base_nodeid does not fully strip nested parameters.
Confidence: 80%
Tokens: 109 input + 99 output = 208 total
Scenario: tests/test_batching.py::TestGetBaseNodeid::test_parametrized_nodeid
Why Needed: The test is necessary because it checks the behavior of `_get_base_nodeid` when a parameterized node id is passed.
Confidence: 80%
Tokens: 133 input + 124 output = 257 total
Scenario: tests/test_batching.py::TestGetBaseNodeid::test_simple_nodeid
Why Needed: This test is needed because it checks the behavior of _get_base_nodeid when given a simple nodeid without any parameters.
Confidence: 80%
Tokens: 123 input + 91 output = 214 total
Scenario: Large groups should be split by batch_max_tests.
Why Needed: This test prevents regression where large groups are not split into batches of the correct size, potentially leading to performance issues or incorrect results.
Confidence: 80%
Tokens: 364 input + 106 output = 470 total
Scenario: Test case for batched tests
Why Needed: To ensure that each test is separate and not affected by the batching mechanism.
Confidence: 80%
Tokens: 170 input + 81 output = 251 total
Scenario: Test parametrized tests should be grouped together.
Why Needed: This test prevents a potential regression where the grouping of tests for batching is not considered.
Confidence: 80%
Tokens: 346 input + 129 output = 475 total
Scenario: Single tests should each be their own batch.
Why Needed: This test prevents a potential bug where multiple tests are grouped together and potentially interfere with each other's execution.
Confidence: 80%
Tokens: 278 input + 121 output = 399 total
Scenario: tests/test_cache.py::TestHashSource::test_consistent_hash
Why Needed: To ensure that the cache is consistent across different runs of the test.
Confidence: 80%
Tokens: 107 input + 75 output = 182 total
Scenario: Tests for cache functionality
Why Needed: To ensure that the cache is working correctly and producing unique hashes for different inputs.
Confidence: 80%
Tokens: 108 input + 105 output = 213 total
Scenario: tests/test_cache.py::TestHashSource::test_hash_length
Why Needed: To ensure the hash value is of a fixed length (16 characters).
Confidence: 80%
Tokens: 100 input + 77 output = 177 total
Scenario: Test that clearing the cache removes all entries.
Why Needed: Prevents regression in case of large number of cache entries.
Confidence: 80%
Tokens: 283 input + 82 output = 365 total
Scenario: tests/test_cache.py::TestLlmCache::test_does_not_cache_errors
Why Needed: To ensure that LLM annotations with errors are not cached.
Confidence: 80%
Tokens: 157 input + 73 output = 230 total
Scenario: tests/test_cache.py::TestLlmCache::test_get_missing
Why Needed: To test that the get method returns None for missing entries.
Confidence: 80%
Tokens: 128 input + 70 output = 198 total
Scenario: Verify that annotations can be set and retrieved from the cache.
Why Needed: Prevents bypass attacks by ensuring that LLMCache stores and retrieves annotations in a secure manner.
Confidence: 80%
Tokens: 286 input + 93 output = 379 total
Scenario: tests/test_collector.py::TestCollectorCollectionErrors::test_collection_error_structure
Why Needed: The test is checking if the collection errors have the correct structure.
Confidence: 80%
Tokens: 124 input + 116 output = 240 total
Scenario: tests/test_collector.py::TestCollectorCollectionErrors::test_get_collection_errors_initially_empty
Why Needed: To ensure that the `get_collection_errors` method returns an empty list when the collection is initially empty.
Confidence: 80%
Tokens: 114 input + 103 output = 217 total
Scenario: tests/test_collector.py::TestCollectorMarkerExtraction::test_llm_context_override_default_none
Why Needed: Default llm_context_override should be None.
Confidence: 80%
Tokens: 136 input + 74 output = 210 total
Scenario: tests/test_collector.py::TestCollectorMarkerExtraction::test_llm_opt_out_default_false
Why Needed: The default value of llm_opt_out should be False.
Confidence: 80%
Tokens: 136 input + 89 output = 225 total
Scenario: Tests for Collector Output Capture
Why Needed: Test that output capture is enabled by default.
Confidence: 80%
Tokens: 104 input + 70 output = 174 total
Scenario: tests/test_collector.py::TestCollectorOutputCapture::test_capture_max_chars_default
Why Needed: The default value for capture output max chars is not being tested.
Confidence: 80%
Tokens: 108 input + 105 output = 213 total
Scenario: tests/test_collector.py::TestCollectorXfailHandling::test_xfail_failed_is_xfailed
Why Needed: To ensure that xfail failures are correctly recorded as xfailed.
Confidence: 80%
Tokens: 206 input + 78 output = 284 total
Scenario: tests/test_collector.py::TestCollectorXfailHandling::test_xfail_passed_is_xpassed
Why Needed: xfail passes should be recorded as xpassed.
Confidence: 80%
Tokens: 205 input + 75 output = 280 total
Scenario: Test the `create_collector` method of `TestCollector` class.
Why Needed: This test prevents a potential bug where the collector does not initialize with empty results, potentially leading to incorrect data being collected.
Confidence: 80%
Tokens: 205 input + 121 output = 326 total
Scenario: tests/test_collector.py::TestTestCollector::test_get_results_sorted
Why Needed: The test is necessary because it checks if the results are sorted by nodeid.
Confidence: 80%
Tokens: 227 input + 106 output = 333 total
Scenario: Test the `handle_collection_finish` method to ensure it correctly tracks collected and deselected counts.
Why Needed: This test prevents a potential issue where the collected count is not updated correctly when items are deselected.
Confidence: 80%
Tokens: 256 input + 105 output = 361 total
Scenario: tests/test_collector_maximal.py::TestCollectorInternals::test_capture_output_disabled_via_handle_report
Why Needed: To test that the collector does not capture output when config is disabled (integration via handle_runtest_logreport)
Confidence: 80%
Tokens: 211 input + 90 output = 301 total
Scenario: TestCollectorInternals test
Why Needed: To ensure that the `collector._capture_output` method correctly captures stderr output.
Confidence: 80%
Tokens: 157 input + 76 output = 233 total
Scenario: TestCollectorInternals test capture_output_stdout
Why Needed: To verify that the `test_capture_output_stdout` method correctly captures stdout.
Confidence: 80%
Tokens: 157 input + 69 output = 226 total
Scenario: tests/test_collector_internals
Why Needed: Truncating output exceeding max chars in TestCollectorInternals test.
Confidence: 80%
Tokens: 174 input + 65 output = 239 total
Scenario: Test creates a result with item markers.
Why Needed: Prevents regression where the collector does not extract item markers correctly when they are present in an item's callspec.
Confidence: 80%
Tokens: 382 input + 213 output = 595 total
Scenario: tests/test_collector_maximal.py::TestCollectorInternals::test_extract_error_string
Why Needed: To ensure the `_extract_error` method returns a string that can be used to recreate the original exception.
Confidence: 80%
Tokens: 130 input + 90 output = 220 total
Scenario: tests/test_collector_maximal.py::TestCollectorInternals::test_extract_skip_reason_fallback
Why Needed: To ensure the `_extract_skip_reason` method returns `None` when no longrepr is provided.
Confidence: 80%
Tokens: 130 input + 92 output = 222 total
Scenario: tests/test_collector_maximal.py::TestCollectorInternals::test_extract_skip_reason_string
Why Needed: To ensure the `extract_skip_reason` method returns a string as expected.
Confidence: 80%
Tokens: 133 input + 78 output = 211 total
Scenario: Test the TestCollector handleCollectionReport function when a collection report fails.
Why Needed: This test prevents a regression where the TestCollector does not correctly record and log collection errors.
Confidence: 80%
Tokens: 273 input + 112 output = 385 total
Scenario: Test the `handle_runtest_rerun` method of TestCollector.
Why Needed: This test prevents regression in handling reruns for a specific test case.
Confidence: 80%
Tokens: 281 input + 122 output = 403 total
Scenario: TestCollectorHandleRuntestSetupFailure verifies that the test collector correctly handles a setup failure in the runtest log report.
Why Needed: This test prevents regression by ensuring that the test collector correctly records and reports errors during setup.
Confidence: 80%
Tokens: 300 input + 106 output = 406 total
Scenario: Test Collector should record error if teardown fails after pass.
Why Needed: To prevent regression in case of a teardown failure, where the test is expected to fail but actually succeeds due to a teardown issue.
Confidence: 80%
Tokens: 391 input + 88 output = 479 total
Scenario: Test invalid context compression mode
Why Needed: To test that an invalid compression mode fails validation.
Confidence: 80%
Tokens: 124 input + 74 output = 198 total
Scenario: tests/test_context_compression.py::TestConfigValidation::test_negative_padding_invalid
Why Needed: Negative padding should fail validation.
Confidence: 80%
Tokens: 121 input + 81 output = 202 total
Scenario: TestConfigValidation
Why Needed: To ensure that valid compression modes pass validation.
Confidence: 80%
Tokens: 135 input + 64 output = 199 total
Scenario: tests/test_context_compression.py::TestConfigValidation::test_zero_padding_valid
Why Needed: Zero padding is a valid configuration option.
Confidence: 80%
Tokens: 122 input + 83 output = 205 total
Scenario: tests/test_context_compression.py::TestContextCompression::test_compression_enabled_by_default
Why Needed: Context compression should be enabled by default ('lines').
Confidence: 80%
Tokens: 119 input + 74 output = 193 total
Scenario: Tests for Context Compression
Why Needed: To ensure that the context compression is working correctly and providing the expected behavior.
Confidence: 80%
Tokens: 113 input + 77 output = 190 total
Scenario: tests/test_context_compression.py::TestContextCompression::test_line_padding_default
Why Needed: To ensure that line padding is correctly set to 2 when the context provider is 'none'.
Confidence: 80%
Tokens: 106 input + 89 output = 195 total
Scenario: Contiguous covered lines should not have gap indicators.
Why Needed: Prevents regression where contiguous lines are separated by gaps, potentially misleading the user about coverage.
Confidence: 80%
Tokens: 293 input + 94 output = 387 total
Scenario: tests/test_context_compression.py::TestExtractCoveredLines::test_empty_coverage
Why Needed: Empty coverage is necessary to test the ContextAssembler's ability to extract covered lines from a test with no actual execution.
Confidence: 80%
Tokens: 130 input + 79 output = 209 total
Scenario: Test that multiple covered ranges are extracted with gap indicators.
Why Needed: This test prevents regression where the output does not contain gap indicators between ranges of covered lines.
Confidence: 80%
Tokens: 274 input + 125 output = 399 total
Scenario: Single covered line should be extracted with padding.
Why Needed: This test prevents a regression where single lines are not extracted correctly due to missing padding.
Confidence: 80%
Tokens: 302 input + 118 output = 420 total
Scenario: Test Extracted Covered Lines: Padding should not go beyond file boundaries.
Why Needed: This test prevents a bug where padding exceeds the file boundary, potentially causing incorrect coverage metrics.
Confidence: 80%
Tokens: 288 input + 85 output = 373 total
Scenario: tests/test_context_limits.py::test_no_truncation_needed
Why Needed: This test is needed because the current implementation of `provider._build_prompt` truncates the context when it detects that a node's content exceeds the specified limit.
Confidence: 80%
Tokens: 158 input + 133 output = 291 total
Scenario: test_smart_distribution verifies that the smart distribution logic prevents a regression and optimizes F2's budget.
Why Needed: The test prevents a regression in the smart distribution logic by ensuring F2 gets at least their fair share of tokens, even if it means truncating some content to avoid waste.
Confidence: 80%
Tokens: 773 input + 169 output = 942 total
Scenario: The test verifies that the splitting logic correctly truncates strings and meets the expected requirements.
Why Needed: This test prevents a potential bug where the splitting logic does not truncate strings, leading to incorrect output or unexpected behavior.
Confidence: 80%
Tokens: 317 input + 109 output = 426 total
Scenario: The test verifies that the `provider._build_prompt` function truncates prompts to fit within a specified limit.
Why Needed: This test prevents regressions where large context files cause excessive prompting and context is not relevant.
Confidence: 80%
Tokens: 397 input + 111 output = 508 total
Scenario: tests/test_context_util.py::TestCollapseEmptyLines::test_collapse_three_empty_lines
Why Needed: The test is necessary because it checks the functionality of the `collapse_empty_lines` function when there are 3+ empty lines in a source string.
Confidence: 80%
Tokens: 128 input + 97 output = 225 total
Scenario: tests/test_context_util.py::TestCollapseEmptyLines::test_many_empty_lines
Why Needed: To test the functionality of collapsing many empty lines to one blank line.
Confidence: 80%
Tokens: 127 input + 88 output = 215 total
Scenario: tests/test_context_util.py::TestCollapseEmptyLines::test_preserve_two_empty_lines
Why Needed: Preserves up to 2 consecutive newlines in context.
Confidence: 80%
Tokens: 125 input + 83 output = 208 total
Scenario: tests/test_context_util.py::TestCollapseEmptyLines::test_single_newline
Why Needed: To test the functionality of collapsing empty lines in a single newline scenario.
Confidence: 80%
Tokens: 121 input + 90 output = 211 total
Scenario: tests/test_context_util.py::TestOptimizeContext::test_always_collapses_empty_lines
Why Needed: The test is necessary because it checks if the `optimize_context` function always collapses empty lines, regardless of the flags used.
Confidence: 80%
Tokens: 137 input + 102 output = 239 total
Scenario: tests/test_context_util.py::TestOptimizeContext::test_combined_optimization
Why Needed: To ensure that the combined optimization process is applied correctly to the test context.
Confidence: 80%
Tokens: 96 input + 110 output = 206 total
Scenario: tests/test_context_util.py::TestOptimizeContext::test_default_strips_docs_only
Why Needed: The default behavior of the `optimize_context` function should strip all docstrings, not just those in comments.
Confidence: 80%
Tokens: 100 input + 88 output = 188 total
Scenario: tests/test_context_util.py::TestOptimizeContext::test_empty_source
Why Needed: This test is needed because the current implementation of `optimize_context` does not handle empty sources correctly.
Confidence: 80%
Tokens: 95 input + 78 output = 173 total
Scenario: tests/test_context_util.py::TestOptimizeContext::test_source_with_only_whitespace
Why Needed: This test is needed because the current implementation does not handle source code with only whitespace characters correctly.
Confidence: 80%
Tokens: 115 input + 91 output = 206 total
Scenario: tests/test_context_util.py::TestOptimizeContext::test_strip_both
Why Needed: To optimize the context by removing unnecessary docstrings and comments.
Confidence: 80%
Tokens: 95 input + 78 output = 173 total
Scenario: tests/test_context_util.py::TestOptimizeContext::test_strip_comments_only
Why Needed: To optimize the context by removing unnecessary comment lines that do not affect the functionality of the code.
Confidence: 80%
Tokens: 95 input + 89 output = 184 total
Scenario: tests/test_context_util.py::TestOptimizeContext::test_strip_neither
Why Needed: To ensure that the optimizer can correctly handle cases where neither strip nor optimize is requested.
Confidence: 80%
Tokens: 94 input + 76 output = 170 total
Scenario: tests/test_context_util.py::TestStripComments::test_comment_after_string_with_hash
Why Needed: To ensure that the function correctly strips comments from strings containing a hash (#) symbol.
Confidence: 80%
Tokens: 134 input + 103 output = 237 total
Scenario: tests/test_context_util.py::TestStripComments::test_escaped_quotes
Why Needed: To ensure that the context utility correctly handles escaped quotes in strings.
Confidence: 80%
Tokens: 133 input + 80 output = 213 total
Scenario: tests/test_context_util.py::TestStripComments::test_mixed_quotes
Why Needed: To strip quotes from code that contains both single and double quotes.
Confidence: 80%
Tokens: 101 input + 77 output = 178 total
Scenario: tests/test_context_util.py::TestStripComments::test_no_comments
Why Needed: To strip comments from the source code, ensuring that only relevant information is included in the output.
Confidence: 80%
Tokens: 91 input + 84 output = 175 total
Scenario: tests/test_context_util.py::TestStripComments::test_preserve_hash_in_double_quoted_string
Why Needed: Preserves # inside double-quoted strings in source code.
Confidence: 80%
Tokens: 135 input + 104 output = 239 total
Scenario: tests/test_context_util.py::TestStripComments::test_preserve_hash_in_single_quoted_string
Why Needed: To ensure that comments are stripped from single-quoted strings, preserving the hash character (#) to maintain code readability.
Confidence: 80%
Tokens: 135 input + 96 output = 231 total
Scenario: tests/test_context_util.py::TestStripComments::test_strip_simple_comment
Why Needed: To remove simple end-of-line comments from the source code.
Confidence: 80%
Tokens: 119 input + 98 output = 217 total
Scenario: tests/test_context_util.py::TestStripComments::test_strip_standalone_comment
Why Needed: To strip standalone comments from the test source code.
Confidence: 80%
Tokens: 99 input + 89 output = 188 total
Scenario: tests/test_context_util.py::TestStripDocstrings::test_handles_syntax_error_gracefully
Why Needed: The test is necessary because it checks that the function does not modify the original source code when a syntax error occurs.
Confidence: 80%
Tokens: 119 input + 112 output = 231 total
Scenario: tests/test_context_util.py::TestStripDocstrings::test_multiple_docstrings
Why Needed: This test is needed because it checks if the context_util module can strip out multiple docstrings from a given string.
Confidence: 80%
Tokens: 95 input + 86 output = 181 total
Scenario: tests/test_context_util.py::TestStripDocstrings::test_preserves_multiline_data_strings
Why Needed: Preserve multiline data strings in docstrings.
Confidence: 80%
Tokens: 103 input + 151 output = 254 total
Scenario: tests/test_context_util.py::TestStripDocstrings::test_preserves_regular_strings
Why Needed: Preserve regular strings in test context
Confidence: 80%
Tokens: 102 input + 83 output = 185 total
LLM error: Failed to parse LLM response as JSON
Scenario: tests/test_context_util.py::TestStripDocstrings::test_strip_multiline_docstring
Why Needed: To ensure that the context utility function works correctly when dealing with multiline docstrings.
Confidence: 80%
Tokens: 97 input + 87 output = 184 total
Scenario: tests/test_context_util.py::TestStripDocstrings::test_strip_triple_double_quoted_docstring
Why Needed: To ensure that the context manager works correctly, especially when dealing with triple double-quoted docstrings.
Confidence: 80%
Tokens: 106 input + 93 output = 199 total
Scenario: tests/test_context_util.py::TestStripDocstrings::test_strip_triple_single_quoted_docstring
Why Needed: The test is necessary because the current implementation does not correctly strip triple single-quoted docstrings.
Confidence: 80%
Tokens: 106 input + 107 output = 213 total
Scenario: Tests the parsing of preferred models for a Gemini configuration with edge cases.
Why Needed: This test prevents regression in case the 'm1' or 'm2' model is not available, as it would cause an error when trying to parse them.
Confidence: 80%
Tokens: 273 input + 118 output = 391 total
Scenario: Verify that the rate limiter prevents over and under token limits when recording tokens but not requests.
Why Needed: This test prevents a potential bug where the rate limiter allows excessive token usage without preventing the request from being processed.
Confidence: 80%
Tokens: 273 input + 142 output = 415 total
Scenario: Verify that the `to_dict()` method of `SourceCoverageEntry` and `LlmAnnotation` classes returns the expected values for coverage percent, error message, and duration.
Why Needed: This test prevents a regression where the coverage percent is not correctly calculated for `SourceCoverageEntry` instances with multiple statements covered.
Confidence: 80%
Tokens: 318 input + 147 output = 465 total
Scenario: tests/test_coverage_map.py::TestCoverageMapper::test_create_mapper
Why Needed: To ensure the Mapper class initializes with a valid configuration.
Confidence: 80%
Tokens: 109 input + 91 output = 200 total
Scenario: tests/test_coverage_map.py::TestCoverageMapper::test_get_warnings
Why Needed: To ensure the get_warnings method returns a list of warnings as expected.
Confidence: 80%
Tokens: 110 input + 82 output = 192 total
Scenario: Test that the `map_coverage` function returns an empty dictionary when no coverage file exists.
Why Needed: Prevents regression in case of missing coverage files, ensuring accurate coverage reporting.
Confidence: 80%
Tokens: 277 input + 102 output = 379 total
Scenario: The test verifies that the `CoverageMapper` correctly extracts node IDs for all phases when the `include_phase` parameter is set to 'all'.
Why Needed: This test prevents a potential bug where the `CoverageMapper` does not extract node IDs for certain phases, leading to incorrect coverage reports.
Confidence: 80%
Tokens: 279 input + 134 output = 413 total
Scenario: tests/test_coverage_map.py::TestCoverageMapperContextExtraction::test_extract_nodeid_empty_context
Why Needed: To handle the case when the context is empty.
Confidence: 80%
Tokens: 128 input + 111 output = 239 total
Scenario: tests/test_coverage_map.py::TestCoverageMapperContextExtraction::test_extract_nodeid_filters_setup
Why Needed: To filter out setup phase when include_phase=run.
Confidence: 80%
Tokens: 139 input + 76 output = 215 total
Scenario: tests/test_coverage_map.py::TestCoverageMapperContextExtraction::test_extract_nodeid_with_run_phase
Why Needed: To extract the correct node ID from the run phase context.
Confidence: 80%
Tokens: 145 input + 90 output = 235 total
Scenario: Test 'test_contexts_by_lineno_exception' verifies that the test_contexts_by_lineno function handles exceptions correctly.
Why Needed: The test prevents a potential regression where the test_contexts_by_lineno function fails to handle an exception when accessing contexts for files with multiple lines of code.
Confidence: 80%
Tokens: 332 input + 140 output = 472 total
Scenario: Test Extract Contexts
Why Needed: When no measured files are present in the coverage data, an empty dictionary should be returned.
Confidence: 80%
Tokens: 136 input + 78 output = 214 total
Scenario: tests/test_coverage_map_coverage.py::TestExtractContexts::test_skip_non_python_files
Why Needed: To skip non-Python files from coverage reports.
Confidence: 80%
Tokens: 154 input + 76 output = 230 total
Scenario: TestLoadCoverageData
Why Needed: To test the scenario when no .coverage file exists.
Confidence: 80%
Tokens: 153 input + 93 output = 246 total
Scenario: Test that the test_analysis_exception_handling function prevents regression by catching Analysis2 exceptions and adding warnings.
Why Needed: This test verifies that the test_analysis_exception_handling function handles analysis2 exceptions correctly, preventing potential regressions.
Confidence: 80%
Tokens: 286 input + 122 output = 408 total
Scenario: Test handling when file has no statements.
Why Needed: To ensure that the coverage map is correctly handled when a file contains no statements.
Confidence: 80%
Tokens: 178 input + 75 output = 253 total
Scenario: Test that test files are included when omit_tests_from_coverage is False.
Why Needed: This test prevents regression in case the `omit_tests_from_coverage` configuration flag is set to True without specifying a custom list of test files.
Confidence: 80%
Tokens: 322 input + 123 output = 445 total
Scenario: test_skip_non_python_files
Why Needed: Skip non-Python files to ensure accurate coverage reporting.
Confidence: 80%
Tokens: 154 input + 56 output = 210 total
Scenario: Test that test files are skipped when omit_tests_from_coverage is True.
Why Needed: The test is necessary to ensure that test files are correctly skipped from the coverage report when `omit_tests_from_coverage` is set to `True`. This helps prevent false positives and ensures accurate reporting of covered code.
Confidence: 80%
Tokens: 182 input + 126 output = 308 total
Scenario: Test that all phases are accepted when configured.
Why Needed: Prevents regression in phase filtering functionality.
Confidence: 80%
Tokens: 305 input + 80 output = 385 total
Scenario: tests/test_coverage_map_coverage.py::TestPhaseFiltering::test_extract_nodeid_empty_string
Why Needed: To test that an empty string does not return a node ID.
Confidence: 80%
Tokens: 115 input + 85 output = 200 total
Scenario: tests/test_coverage_map_coverage.py::TestPhaseFiltering::test_extract_nodeid_none
Why Needed: To ensure that the `_extract_nodeid` method handles `None` inputs correctly and returns `None` as expected.
Confidence: 80%
Tokens: 114 input + 93 output = 207 total
Scenario: Test that run phase is the default filter.
Why Needed: This test prevents a regression where the default filter does not match the expected node ID.
Confidence: 80%
Tokens: 297 input + 120 output = 417 total
Scenario: Test that setup phase is correctly filtered when configured.
Why Needed: Prevents a regression where the test might fail due to incorrect filtering of nodeids in the setup phase.
Confidence: 80%
Tokens: 293 input + 125 output = 418 total
Scenario: Test that teardown phase is correctly filtered when configured.
Why Needed: This test prevents a potential bug where the teardown phase is not properly filtered, leading to incorrect coverage reporting.
Confidence: 80%
Tokens: 296 input + 125 output = 421 total
Scenario: tests/test_coverage_map_coverage.py::TestPhaseFiltering::test_extract_nodeid_without_pipe
Why Needed: This test is necessary because the current implementation of `CoverageMapper` does not correctly handle node IDs without phase delimiters.
Confidence: 80%
Tokens: 136 input + 93 output = 229 total
Scenario: Should exercise all paths in _extract_contexts to ensure full logic coverage.
Why Needed: This test prevents regression by verifying that the mapper extracts all necessary contexts for full logic coverage.
Confidence: 80%
Tokens: 413 input + 118 output = 531 total
Scenario: tests/test_coverage_map_maximal.py::TestCoverageMapperMaximal::test_extract_contexts_no_contexts
Why Needed: To test the coverage mapper's behavior when there are no test contexts.
Confidence: 80%
Tokens: 174 input + 73 output = 247 total
Scenario: The test verifies that the `CoverageMapper` correctly extracts node IDs for a scenario where there are missing lines in the code.
Why Needed: This test prevents a potential regression where the coverage map is not accurately reporting the number of covered lines due to missing or filtered code.
Confidence: 80%
Tokens: 323 input + 171 output = 494 total
Scenario: Test that the function correctly handles the case when no coverage files exist.
Why Needed: This test prevents a potential bug where the function would silently fail to load coverage data without raising an error.
Confidence: 80%
Tokens: 276 input + 113 output = 389 total
Scenario: Test that the test_load_coverage_data_read_error function handles errors reading coverage files correctly.
Why Needed: This test prevents a potential regression where the CoverageMapper class fails to handle errors when loading coverage data from corrupted or invalid files.
Confidence: 80%
Tokens: 343 input + 211 output = 554 total
Scenario: Test should handle parallel coverage files from xdist and verify that the CoverageMapper correctly updates its data.
Why Needed: This test prevents regression where the CoverageMapper does not update its data when loading coverage files from parallel directories.
Confidence: 80%
Tokens: 378 input + 304 output = 682 total
Scenario: Test that the `map_coverage` method returns an empty dictionary when `_load_coverage_data` returns None.
Why Needed: Prevents a potential bug where the `map_coverage` method does not handle cases where there is no coverage data to map.
Confidence: 80%
Tokens: 228 input + 167 output = 395 total
Scenario: Test that the CoverageMapper handles analysis errors during source coverage analysis.
Why Needed: This test prevents a regression where an error in analysis2 causes all files to be skipped without any meaningful output.
Confidence: 80%
Tokens: 274 input + 118 output = 392 total
Scenario: Verify that the test covers all paths in map_source_coverage with comprehensive coverage.
Why Needed: This test prevents regression by ensuring that all possible source files are covered under the given configuration.
Confidence: 80%
Tokens: 345 input + 123 output = 468 total
Scenario: Test the `make_warning` factory function to verify it returns a WarningCode.W001_NO_COVERAGE warning with the specified detail.
Why Needed: This test prevents a potential bug where the `make_warning` function does not correctly identify warnings without coverage files.
Confidence: 80%
Tokens: 236 input + 128 output = 364 total
Scenario: Test that warning codes have correct values.
Why Needed: Prevents a potential bug where the warning code values are incorrect, potentially leading to unexpected behavior or errors in the application.
Confidence: 80%
Tokens: 240 input + 167 output = 407 total
Scenario: Test ReportWarning.to_dict() method.
Why Needed: Prevents a warning that is not properly formatted in the report.
Confidence: 80%
Tokens: 276 input + 144 output = 420 total
Scenario: Test verifies that a warning is created with the correct code and message for known code.
Why Needed: To prevent a regression where warnings are not correctly generated when using known code.
Confidence: 80%
Tokens: 222 input + 125 output = 347 total
Scenario: tests/test_errors_maximal.py::TestMakeWarning::test_make_warning_unknown_code
Why Needed: To handle unknown WarningCode values that are not part of the enum.
Confidence: 80%
Tokens: 202 input + 83 output = 285 total
Scenario: tests/test_errors_maximal.py::TestMakeWarning::test_make_warning_with_detail
Why Needed: To test the creation of a warning with detail.
Confidence: 80%
Tokens: 127 input + 102 output = 229 total
Scenario: Tests failed
Why Needed: The test 'test_codes_are_strings' is expected to pass. However, it has raised a warning.
Confidence: 80%
Tokens: 109 input + 95 output = 204 total
Scenario: Tests for ReportWarning class
Why Needed: To ensure that the warning is correctly serialized to a dictionary without any additional details.
Confidence: 80%
Tokens: 147 input + 91 output = 238 total
Scenario: tests/test_fs.py::TestIsPythonFile::test_non_python_file
Why Needed: The test is checking if the function correctly identifies non-.py files.
Confidence: 80%
Tokens: 115 input + 144 output = 259 total
Scenario: tests/test_fs.py::TestIsPythonFile::test_python_file
Why Needed: The function `is_python_file()` should be able to identify .py files.
Confidence: 80%
Tokens: 98 input + 85 output = 183 total
Scenario: tests/test_fs.py::TestMakeRelative::test_makes_path_relative
Why Needed: To test the functionality of making a path relative to the current working directory.
Confidence: 80%
Tokens: 144 input + 79 output = 223 total
Scenario: tests/test_fs.py::TestMakeRelative::test_returns_normalized_with_no_base
Why Needed: To ensure that the `make_relative` function returns a normalized path when no base is provided.
Confidence: 80%
Tokens: 107 input + 80 output = 187 total
Scenario: tests/test_fs.py::TestNormalizePath::test_already_normalized
Why Needed: The current implementation of `normalize_path` does not correctly handle already-normalized paths.
Confidence: 80%
Tokens: 96 input + 82 output = 178 total
Scenario: tests/test_fs.py::TestNormalizePath::test_forward_slashes
Why Needed: To ensure that the `normalize_path` function correctly handles paths with forward slashes.
Confidence: 80%
Tokens: 100 input + 76 output = 176 total
Scenario: tests/test_fs.py::TestNormalizePath::test_strips_trailing_slash
Why Needed: To ensure that the `normalize_path` function correctly removes trailing slashes from file paths.
Confidence: 80%
Tokens: 102 input + 79 output = 181 total
Scenario: tests/test_fs.py::TestShouldSkipPath::test_custom_exclude_patterns
Why Needed: This test ensures that the `should_skip_path` function correctly handles custom pattern exclusion.
Confidence: 80%
Tokens: 126 input + 129 output = 255 total
Scenario: tests/test_fs.py::TestShouldSkipPath::test_normal_path
Why Needed: The test should be able to pass without skipping any path.
Confidence: 80%
Tokens: 96 input + 76 output = 172 total
Scenario: tests/test_fs.py::TestShouldSkipPath::test_skips_git
Why Needed: The current implementation of `should_skip_path` does not correctly handle `.git` directories.
Confidence: 80%
Tokens: 99 input + 106 output = 205 total
Scenario: tests/test_fs.py::TestShouldSkipPath::test_skips_pycache
Why Needed: Because the test case is testing the functionality of skipping __pycache__ directories.
Confidence: 80%
Tokens: 109 input + 77 output = 186 total
Scenario: tests/test_fs.py::TestShouldSkipPath::test_skips_venv
Why Needed: Because the test case `should_skip_path` is trying to check if a specific directory should be skipped, but it's actually causing an issue with the venv directories.
Confidence: 80%
Tokens: 121 input + 144 output = 265 total
Scenario: Verifies that a non-.py file does not match the expected behavior.
Why Needed: Prevents potential bugs where a non-.py file is incorrectly identified as Python code.
Confidence: 80%
Tokens: 210 input + 121 output = 331 total
Scenario: Testing if a file is a Python file.
Why Needed: Prevents a potential bug where a non-Python file is incorrectly identified as such.
Confidence: 80%
Tokens: 212 input + 118 output = 330 total
Scenario: Test makes a relative path not under the base directory when it is not.
Why Needed: Prevents regression where make_relative fails to return normalized absolute paths for non-relative paths.
Confidence: 80%
Tokens: 301 input + 135 output = 436 total
Scenario: Test Make Relative
Why Needed: To test the functionality of making a relative path to a file.
Confidence: 80%
Tokens: 147 input + 62 output = 209 total
Scenario: tests/test_fs_coverage.py::TestMakeRelative::test_make_relative_with_none_base
Why Needed: To ensure that the `make_relative` function correctly handles cases where the base is None.
Confidence: 80%
Tokens: 116 input + 82 output = 198 total
Scenario: tests/test_fs_coverage.py::TestNormalizePath::test_normalize_path_backslashes
Why Needed: To ensure that backslashes are correctly converted to forward slashes in file paths.
Confidence: 80%
Tokens: 114 input + 80 output = 194 total
Scenario: tests/test_fs_coverage.py::TestNormalizePath::test_normalize_path_path_object
Why Needed: To ensure the `normalize_path` function correctly normalizes path objects, specifically when dealing with file paths.
Confidence: 80%
Tokens: 110 input + 96 output = 206 total
Scenario: tests/test_fs_coverage.py::TestNormalizePath::test_normalize_path_trailing_slash
Why Needed: To ensure that the `normalize_path` function correctly removes trailing slashes from file paths.
Confidence: 80%
Tokens: 111 input + 82 output = 193 total
Scenario: tests/test_fs_coverage.py::TestShouldSkipPath::test_should_not_skip_regular_path
Why Needed: Regular paths are not skipped by default.
Confidence: 80%
Tokens: 120 input + 107 output = 227 total
Scenario: tests/test_fs_coverage.py::TestShouldSkipPath::test_should_skip_git
Why Needed: The test should skip the .git directory because it contains a Git hook that may be causing issues with the test.
Confidence: 80%
Tokens: 102 input + 83 output = 185 total
Scenario: tests/test_fs_coverage.py::TestShouldSkipPath::test_should_skip_path_starting_with_skip_dir
Why Needed: To ensure that the function correctly handles paths starting with a skip directory name.
Confidence: 80%
Tokens: 124 input + 146 output = 270 total
Scenario: tests/test_fs_coverage.py::TestShouldSkipPath::test_should_skip_pycache
Why Needed: Because the test module __pycache__ is being tested.
Confidence: 80%
Tokens: 116 input + 62 output = 178 total
Scenario: /usr/lib/python3.12/site-packages/pkg/mod.py
Why Needed: Because it's a site-package directory.
Confidence: 80%
Tokens: 111 input + 64 output = 175 total
Scenario: tests/test_fs_coverage.py::TestShouldSkipPath::test_should_skip_venv
Why Needed: The test is checking if venv directories are skipped by the `should_skip_path` function.
Confidence: 80%
Tokens: 130 input + 112 output = 242 total
Scenario: tests/test_fs_coverage.py::TestShouldSkipPath::test_should_skip_with_exclude_patterns
Why Needed: Custom exclude patterns are needed to skip certain files that contain sensitive information.
Confidence: 80%
Tokens: 132 input + 110 output = 242 total
Scenario: Test that the test_annotate_loop_daily_limit_hit function prevents a daily limit hit when the provider is configured with an empty _models list and a mock limiter that returns None for the daily limit.
Why Needed: This test prevents a potential issue where the provider would exceed its daily limit of requests per day, potentially causing unexpected behavior or errors in downstream applications.
Confidence: 80%
Tokens: 367 input + 393 output = 760 total
Scenario: Test that _GeminiRateLimitExceeded is raised when a request exceeds the limit.
Why Needed: To prevent regression where a request exceeds the rate limit and causes the model to be exhausted.
Confidence: 80%
Tokens: 730 input + 181 output = 911 total
Scenario: Prevents regression in coverage gaps by ensuring that the rate limiters are correctly configured and annotated for different scenarios.
Why Needed: This test prevents regression in coverage gaps because it ensures that the rate limiters are correctly configured and annotated for different scenarios, such as prompt_override, context too long error, RPD in parse_rate_limits, fallback models, input limits logic (Flash vs Pro).
Confidence: 80%
Tokens: 821 input + 238 output = 1059 total
Scenario: TestGeminiProvider
Why Needed: To ensure the GeminiProvider correctly handles cases where the `model` parameter is not provided or is set to 'ALL'.
Confidence: 80%
Tokens: 156 input + 73 output = 229 total
Scenario: TestGeminiProvider
Why Needed: To ensure the Gemini provider is correctly pruning daily requests that are older than 24 hours.
Confidence: 80%
Tokens: 157 input + 94 output = 251 total
Scenario: Verify that the test_tpm_available_fallback function waits for a sufficient time before allowing token requests.
Why Needed: This test prevents regression where the Gemini provider may not wait long enough for token requests to be processed after a previous request was made.
Confidence: 80%
Tokens: 524 input + 220 output = 744 total
Scenario: Test that the test_annotate_import_error function verifies when google-generativeai is not installed.
Why Needed: This test prevents a potential import error caused by missing required dependencies.
Confidence: 80%
Tokens: 259 input + 224 output = 483 total
Scenario: Test that annotation fails when token is missing from the environment.
Why Needed: Prevents a potential bug where the Gemini provider throws an error due to an unprovided GEMINI_API_TOKEN.
Confidence: 80%
Tokens: 313 input + 178 output = 491 total
Scenario: Test that the GeminiProvider correctly annotates a rate limit retry scenario.
Why Needed: This test prevents regression in the GeminiProvider's ability to handle rate limit retries.
Confidence: 80%
Tokens: 636 input + 171 output = 807 total
Scenario: Verify that the _annotate_success method returns a correct annotation when successful.
Why Needed: This test prevents regression in the GeminiProvider's _annotate_internal method, which may return an incorrect annotation if the response from _call_gemini is not in the expected format.
Confidence: 80%
Tokens: 649 input + 212 output = 861 total
Scenario: Verifies that the GeminiProvider class correctly checks for availability based on environment variables.
Why Needed: This test prevents a potential bug where the provider may not be available due to missing or incorrect environment variables.
Confidence: 80%
Tokens: 235 input + 147 output = 382 total
Scenario: Test that the GeminiProvider class correctly handles retry exceptions and model exhaustion when calling _annotate_internal with a mock ResourceExhausted exception.
Why Needed: This test prevents regression in the GeminiProvider class, where it may not handle retry exceptions or model exhaustion correctly when calling _annotate_internal.
Confidence: 80%
Tokens: 651 input + 317 output = 968 total
Scenario: Test that the GeminiProvider correctly clears _model_exhausted_at when annotating with a successful call to _call_gemini.
Why Needed: The test prevents regression where the _model_exhausted_at is not cleared after a successful annotation, potentially leading to incorrect assertions in other tests.
Confidence: 80%
Tokens: 482 input + 210 output = 692 total
Scenario: TestGeminiProviderDetailed::test_ensure_rate_limits_error
Why Needed: To test that the ` GeminiProvider` raises an exception when rate limiting is attempted with a non-numeric value.
Confidence: 80%
Tokens: 156 input + 100 output = 256 total
Scenario: TestGeminiProviderDetailed.test_fetch_available_models_error
Why Needed: To test the error handling of fetching available models when a network error occurs.
Confidence: 80%
Tokens: 132 input + 98 output = 230 total
Scenario: Test that fetching available models with invalid JSON data prevents a bug related to model validation.
Why Needed: This test verifies that the GeminiProvider class correctly handles invalid JSON input when fetching available models.
Confidence: 80%
Tokens: 340 input + 187 output = 527 total
Scenario: tests/test_gemini_provider.py::TestGeminiProviderDetailed::test_get_max_context_tokens_calls_ensure
Why Needed: To ensure that the `get_max_context_tokens` method of the GeminiProvider class calls the mock function correctly.
Confidence: 80%
Tokens: 144 input + 91 output = 235 total
Scenario: tests/test_gemini_provider.py::TestGeminiProviderDetailed::test_parse_rate_limits_types
Why Needed: This test ensures that the GeminiProvider can correctly parse rate limits from a JSON configuration.
Confidence: 80%
Tokens: 156 input + 103 output = 259 total
Scenario: Verify that the `prune_logic` method correctly removes old requests and updates token usage when a new request is added.
Why Needed: This test prevents regression in the `prune_logic` method, which may cause outdated data to be returned for existing requests.
Confidence: 80%
Tokens: 323 input + 180 output = 503 total
Scenario: tests/test_gemini_provider.py::TestGeminiRateLimiter::test_record_tokens_invalid
Why Needed: The test is failing because the rate limiter is not correctly handling invalid token records.
Confidence: 80%
Tokens: 127 input + 95 output = 222 total
Scenario: Test that the rate limiter does not exceed the limit when no requests are made.
Why Needed: The test ensures that the rate limiter does not allow more requests than allowed by the configuration, which would result in a 'RateLimitExceeded' error.
Confidence: 80%
Tokens: 129 input + 133 output = 262 total
Scenario: Verify that the rate limiter does not block the third request after two successful requests.
Why Needed: This test prevents a potential issue where the third request is blocked due to insufficient available time for subsequent requests.
Confidence: 80%
Tokens: 280 input + 130 output = 410 total
Scenario: Verify that the rate limiter correctly handles requests exceeding the limit when there are no tokens available.
Why Needed: This test prevents a potential bug where the rate limiter does not properly handle scenarios where there are no tokens available and more than one minute has passed since the last request.
Confidence: 80%
Tokens: 377 input + 114 output = 491 total
Scenario: Verify that the `wait_for_slot` method raises an exception when the daily limit is exceeded.
Why Needed: This test prevents a potential bug where the rate limiter does not raise an exception when the daily limit is exceeded, potentially causing unexpected behavior or errors in downstream systems.
Confidence: 80%
Tokens: 263 input + 126 output = 389 total
Scenario: Test that the `wait_for_slot` method sleeps for a sufficient amount of time when waiting for an available slot.
Why Needed: This test prevents regression where the rate limiter does not sleep long enough to allow subsequent requests to wait their turn, potentially leading to performance issues or errors.
Confidence: 80%
Tokens: 325 input + 86 output = 411 total
Scenario: tests/test_hashing.py::TestComputeConfigHash::test_different_config
Why Needed: To ensure that different configurations of the Compute API produce different hashes.
Confidence: 80%
Tokens: 119 input + 82 output = 201 total
Scenario: tests/test_hashing.py::TestComputeConfigHash::test_returns_short_hash
Why Needed: To ensure the computed hash is short and does not exceed 16 characters.
Confidence: 80%
Tokens: 109 input + 87 output = 196 total
Scenario: File hashing consistency
Why Needed: To ensure that the hash of a file matches its content.
Confidence: 80%
Tokens: 144 input + 60 output = 204 total
Scenario: Hashing a file
Why Needed: To test the correctness of the hash computation function.
Confidence: 80%
Tokens: 124 input + 114 output = 238 total
Scenario: tests/test_hashing.py::TestComputeHmac::test_different_key
Why Needed: To ensure that different keys produce different signatures.
Confidence: 80%
Tokens: 125 input + 65 output = 190 total
Scenario: tests/test_hashing.py::TestComputeHmac::test_with_key
Why Needed: To verify the correctness of HMAC computation with a key.
Confidence: 80%
Tokens: 108 input + 70 output = 178 total
Scenario: tests/test_hashing.py::TestComputeSha256::test_consistent
Why Needed: To ensure that the hash function is consistent and produces the same output for the same input.
Confidence: 80%
Tokens: 115 input + 80 output = 195 total
Scenario: tests/test_hashing.py::TestComputeSha256::test_length
Why Needed: To ensure the length of the computed SHA-256 hash is 64 characters (64 hexadecimal digits).
Confidence: 80%
Tokens: 103 input + 88 output = 191 total
Scenario: tests/test_hashing.py::TestGetDependencySnapshot::test_includes_pytest
Why Needed: To ensure that the 'pytest' package is included in the dependency snapshot.
Confidence: 80%
Tokens: 102 input + 86 output = 188 total
Scenario: tests/test_hashing.py::TestGetDependencySnapshot::test_returns_dict
Why Needed: To ensure that the `get_dependency_snapshot` function returns a dictionary as expected.
Confidence: 80%
Tokens: 98 input + 82 output = 180 total
Scenario: tests/test_hashing.py::TestLoadHmacKey::test_loads_key
Why Needed: To test that the hmac.load function correctly loads a key from a file.
Confidence: 80%
Tokens: 145 input + 82 output = 227 total
Scenario: tests/test_hashing.py::TestLoadHmacKey::test_missing_key_file
Why Needed: The test should return None if the key file does not exist.
Confidence: 80%
Tokens: 126 input + 76 output = 202 total
Scenario: tests/test_hashing.py::TestLoadHmacKey::test_no_key_file
Why Needed: Because the test case requires a valid HMAC key to be loaded.
Confidence: 80%
Tokens: 110 input + 74 output = 184 total
Scenario: Verify aggregation configuration defaults.
Why Needed: Prevents a potential bug where aggregation settings are not properly initialized with default values.
Confidence: 80%
Tokens: 201 input + 70 output = 271 total
Scenario: tests/test_integration_gate.py::TestConfigDefaults::test_capture_failed_output_default_true
Why Needed: The test captures failed output by default.
Confidence: 80%
Tokens: 107 input + 73 output = 180 total
Scenario: tests/test_integration_gate.py::TestConfigDefaults::test_context_mode_default_minimal
Why Needed: To ensure the context mode is set to 'minimal' by default.
Confidence: 80%
Tokens: 107 input + 77 output = 184 total
Scenario: tests/test_integration_gate.py::TestConfigDefaults::test_llm_not_enabled_by_default
Why Needed: LLM is currently disabled by default.
Confidence: 80%
Tokens: 109 input + 81 output = 190 total
Scenario: tests/test_integration_gate.py::TestConfigDefaults::test_omit_tests_default_true
Why Needed: The test is necessary because it checks the default behavior of omitting tests from coverage.
Confidence: 80%
Tokens: 109 input + 80 output = 189 total
Scenario: tests/test_integration_gate.py::TestConfigDefaults::test_provider_default_none
Why Needed: The provider is set to 'none' by default, which may not be suitable for all use cases.
Confidence: 80%
Tokens: 101 input + 64 output = 165 total
Scenario: Integration test of gate configuration
Why Needed: To ensure that secret files are excluded from the LLM context.
Confidence: 80%
Tokens: 132 input + 118 output = 250 total
Scenario: The test verifies that the deterministic output of the integration gate is correctly reported.
Why Needed: This test prevents a regression where the deterministic output may not be reported correctly due to changes in the test data or configuration.
Confidence: 80%
Tokens: 313 input + 84 output = 397 total
Scenario: Test that an empty test suite produces a valid report.
Why Needed: This test prevents regression in case the test suite is empty.
Confidence: 80%
Tokens: 240 input + 121 output = 361 total
Scenario: Test that the full pipeline generates an HTML report.
Why Needed: This test prevents regression where the pipeline does not generate an HTML report even when all tests pass.
Confidence: 80%
Tokens: 270 input + 108 output = 378 total
Scenario: Verify that a full pipeline generates a valid JSON report with the correct schema version, summary statistics, and skipped tests.
Why Needed: This test prevents regression in the integration gate by ensuring that the full pipeline correctly generates a JSON report with the expected structure and content.
Confidence: 80%
Tokens: 419 input + 240 output = 659 total
Scenario: Tests ReportRoot has required fields.
Why Needed: This test ensures that the report root contains all necessary fields for a valid schema.
Confidence: 80%
Tokens: 250 input + 92 output = 342 total
Scenario: {'description': 'RunMeta has aggregation fields.', 'expected_result': {'schema': {'type': 'object', 'properties': {'is_aggregated': {'type': 'boolean'}, 'run_count': {'type': 'integer'}}}, 'assertions': [{'key_assertion': ['is_aggregated'], 'expected_result': True}, {'key_assertion': ['run_count'], 'expected_result': 0}]}}
Why Needed: The test is necessary because the RunMeta object does not have an 'aggregation_fields' property. The presence of this property in the schema indicates that aggregation fields are required for a RunMeta object to be valid.
Confidence: 80%
Tokens: 136 input + 246 output = 382 total
Scenario: Test 'RunMeta has run status fields' verifies that the RunMeta object contains status fields.
Why Needed: This test prevents a potential regression where the RunMeta object is missing required status fields.
Confidence: 80%
Tokens: 237 input + 135 output = 372 total
Scenario: tests/test_integration_gate.py::TestSchemaCompatibility::test_schema_version_defined
Why Needed: The schema version is defined to ensure compatibility with gate APIs.
Confidence: 80%
Tokens: 103 input + 87 output = 190 total
Scenario: Test 'test_case_has_required_fields' verifies that the TestCaseResult object has required fields.
Why Needed: This test prevents a potential bug where a TestCaseResult object is created without all necessary fields (nodeid, outcome, duration).
Confidence: 80%
Tokens: 223 input + 116 output = 339 total
Scenario: Test that all retries are exhausted when API calls fail.
Why Needed: Prevents regression where LiteLLMProvider fails to retry after exhausting all retries.
Confidence: 80%
Tokens: 346 input + 140 output = 486 total
Scenario: Test that non-401 errors don't force token refresh.
Why Needed: Prevents regression in case of non-401 error without forcing token refresh.
Confidence: 80%
Tokens: 367 input + 159 output = 526 total
Scenario: Test that retry succeeds after transient error.
Why Needed: To ensure the LLM can recover from transient errors and still complete successfully.
Confidence: 80%
Tokens: 433 input + 102 output = 535 total
Scenario: Test that 401 error triggers token refresh when API call fails first, then succeeds.
Why Needed: To ensure the LLMTokenRefreshRetry test suite covers cases where the API call fails before a retry attempt.
Confidence: 80%
Tokens: 473 input + 229 output = 702 total
Scenario: tests/test_llm.py::TestGetProvider::test_gemini_returns_provider
Why Needed: The test is necessary because the Gemini model requires a specific provider to be used.
Confidence: 80%
Tokens: 131 input + 80 output = 211 total
Scenario: tests/test_llm.py::TestGetProvider::test_litellm_returns_provider
Why Needed: To ensure that the LiteLLMProvider class is correctly instantiated when a specific provider is used.
Confidence: 80%
Tokens: 140 input + 86 output = 226 total
Scenario: tests/test_llm.py::TestGetProvider::test_none_returns_noop
Why Needed: This test is necessary because the LLM's GetProvider method returns a NoopProvider when the provider is None.
Confidence: 80%
Tokens: 115 input + 138 output = 253 total
Scenario: tests/test_llm.py::TestGetProvider::test_ollama_returns_provider
Why Needed: To ensure that the OllamaProvider is correctly returned when a specific provider is specified.
Confidence: 80%
Tokens: 154 input + 80 output = 234 total
Scenario: Test that NoopProvider implements LlmProvider contract.
Why Needed: Prevents a potential bug where the NoopProvider does not implement all required methods of LlmProvider.
Confidence: 80%
Tokens: 232 input + 90 output = 322 total
Scenario: The test verifies that the NoopProvider returns an empty annotation when no annotation is specified.
Why Needed: This test prevents a regression where the NoopProvider does not return any annotation for a function with no annotations.
Confidence: 80%
Tokens: 249 input + 103 output = 352 total
Scenario: tests/test_llm.py::TestNoopProvider::test_get_model_name_empty
Why Needed: This test is needed because the model name is not being returned correctly when an empty string is passed to get_model_name.
Confidence: 80%
Tokens: 114 input + 100 output = 214 total
Scenario: tests/test_llm.py::TestNoopProvider::test_is_available
Why Needed: The LLM is not available.
Confidence: 80%
Tokens: 108 input + 74 output = 182 total
Scenario: This test is designed to ensure that the `ANNOTATION_JSON_SCHEMA` correctly requires certain fields.
Why Needed: The purpose of this test is to verify that the schema does not allow optional fields and that required fields are present.
Confidence: 80%
Tokens: 115 input + 160 output = 275 total
Scenario: Test that AnnotationSchema.from_dict parses a dictionary correctly.
Why Needed: Prevents incorrect parsing of user data from a dict.
Confidence: 80%
Tokens: 274 input + 55 output = 329 total
Scenario: This test checks if the AnnotationSchema can handle an empty input.
Why Needed: The test is necessary because the AnnotationSchema requires a non-empty string for the scenario and why-neededs fields.
Confidence: 80%
Tokens: 109 input + 108 output = 217 total
Scenario: Test case for testing AnnotationSchema
Why Needed: This test is necessary to ensure the AnnotationSchema handles partial input correctly.
Confidence: 80%
Tokens: 119 input + 98 output = 217 total
Scenario: The test verifies that the schema has required fields.
Why Needed: This test prevents a bug where the schema is missing required fields, potentially leading to errors or inconsistencies.
Confidence: 80%
Tokens: 215 input + 157 output = 372 total
Scenario: TestAnnotationSchema::test_schema_to_dict verifies that the AnnotationSchema instance correctly serializes to a dictionary.
Why Needed: This test prevents regression by ensuring that the AnnotationSchema instance can be serialized and deserialized correctly.
Confidence: 80%
Tokens: 247 input + 129 output = 376 total
Scenario: tests/test_llm_contract.py::TestNoopProvider::test_noop_from_factory
Why Needed: The test is necessary to ensure that the factory returns a NoopProvider for provider='none'.
Confidence: 80%
Tokens: 118 input + 86 output = 204 total
Scenario: tests/test_llm_contract.py::TestNoopProvider::test_noop_is_llm_provider
Why Needed: To ensure that the NoopProvider class correctly implements the LlmProvider interface.
Confidence: 80%
Tokens: 117 input + 84 output = 201 total
Scenario: The NoopProvider returns an empty annotation when the test function does not have any annotations.
Why Needed: This test prevents a regression where the NoopProvider incorrectly returns an empty annotation for tests with no annotations.
Confidence: 80%
Tokens: 253 input + 104 output = 357 total
Scenario: Verify that the `annotate` method returns a `TestCaseResult` object with the correct attributes.
Why Needed: This test prevents regression where the `annotate` method does not return an expected `TestCaseResult` object.
Confidence: 80%
Tokens: 263 input + 138 output = 401 total
Scenario: Provider handles empty code gracefully
Why Needed: Test case to ensure the provider can handle scenarios with empty code.
Confidence: 80%
Tokens: 145 input + 111 output = 256 total
Scenario: Provider handles None context gracefully
Why Needed: To ensure the provider can handle None context without throwing an error.
Confidence: 80%
Tokens: 148 input + 78 output = 226 total
Scenario: tests/test_llm_contract.py::TestProviderContract::test_provider_has_annotate_method
Why Needed: To ensure that all providers have an annotate method.
Confidence: 80%
Tokens: 145 input + 96 output = 241 total
Scenario: tests/test_llm_providers.py::TestGeminiProvider::test_annotate_handles_context_too_large
Why Needed: Because the annotation process is too resource-intensive for large contexts.
Confidence: 80%
Tokens: 98 input + 90 output = 188 total
Scenario: Test that the LiteLLMProvider annotates a missing dependency correctly.
Why Needed: This test prevents a bug where the provider does not report an error for missing dependencies.
Confidence: 80%
Tokens: 270 input + 116 output = 386 total
Scenario: Test that the `annotate` method of a GeminiProvider object raises an error when no API token is provided.
Why Needed: To prevent a potential bug where the `annotate` method fails to raise an error when an API token is missing, allowing the test case to pass even if the provider is not properly configured.
Confidence: 80%
Tokens: 440 input + 295 output = 735 total
Scenario: Verify that the `annotate_records_tokens` test prevents regressions by ensuring tokens are recorded correctly.
Why Needed: To prevent regressions caused by a change in how token usage is handled.
Confidence: 80%
Tokens: 783 input + 124 output = 907 total
Scenario: tests/test_llm_providers.py
Why Needed: To ensure that the LLM provider can correctly handle rate limiting and retry logic.
Confidence: 80%
Tokens: 98 input + 128 output = 226 total
Scenario: tests/test_llm_providers.py::TestGeminiProvider::test_annotate_rotates_models_on_daily_limit
Why Needed: To ensure that the model rotation is applied correctly on a daily limit.
Confidence: 80%
Tokens: 100 input + 100 output = 200 total
Scenario: tests/test_llm_providers.py::TestGeminiProvider::test_annotate_skips_on_daily_limit
Why Needed: Because the `annotate` method is skipping annotations on a daily limit.
Confidence: 80%
Tokens: 98 input + 151 output = 249 total
Scenario: Test that LiteLLM provider annotates a successful response with the correct key assertions and confidence level.
Why Needed: Prevents regression by ensuring the annotation is accurate for valid responses.
Confidence: 80%
Tokens: 474 input + 65 output = 539 total
Scenario: tests/test_llm_providers.py
Why Needed: The LLM provider's model should recover from being exhausted after 24 hours.
Confidence: 80%
Tokens: 104 input + 81 output = 185 total
Scenario: tests/test_llm_providers.py
Why Needed: To ensure that the `fetch_available_models` method raises an error when no models are available.
Confidence: 80%
Tokens: 92 input + 89 output = 181 total
Scenario: Tests for LLM providers
Why Needed: To ensure that the model list refreshes after a specified interval.
Confidence: 80%
Tokens: 96 input + 78 output = 174 total
Scenario: Test that LiteLLM provider retries on 401 after refreshing token.
Why Needed: Reason: The current implementation does not retry when the token is refreshed, causing a failure in test_401_retry_with_token_refresh.
Confidence: 80%
Tokens: 580 input + 186 output = 766 total
Scenario: The test verifies that the LiteLLMProvider annotates completion errors correctly.
Why Needed: This test prevents a regression where the provider does not surface completion errors in annotations.
Confidence: 80%
Tokens: 307 input + 188 output = 495 total
Scenario: Test that LiteLLMProvider rejects invalid key_assertions payloads.
Why Needed: To prevent regression and ensure the correct behavior of LiteLLMProvider when receiving invalid key_assertions payloads.
Confidence: 80%
Tokens: 346 input + 135 output = 481 total
Scenario: The LiteLLMProvider annotates the missing dependency 'litellm' in the test case.
Why Needed: This test prevents a potential bug where the provider reports an error due to a missing required library, causing the test to fail or produce incorrect results.
Confidence: 80%
Tokens: 271 input + 116 output = 387 total
Scenario: Test that the LiteLLMProvider annotates a valid response payload successfully.
Why Needed: Prevents regressions by ensuring the annotation is correct for successful responses.
Confidence: 80%
Tokens: 475 input + 61 output = 536 total
Scenario: Test that LiteLLMProvider overrides the prompt when provided.
Why Needed: To ensure that the LiteLLM provider uses the custom prompt instead of the default one.
Confidence: 80%
Tokens: 373 input + 159 output = 532 total
Scenario: Test LiteLLM provider to annotate with token usage.
Why Needed: Prevents regression in token usage extraction from responses.
Confidence: 80%
Tokens: 426 input + 127 output = 553 total
Scenario: Test that the LiteLLM provider passes `api_base` to completion call.
Why Needed: This test prevents regression in case where `api_base` is not set correctly.
Confidence: 80%
Tokens: 387 input + 222 output = 609 total
Scenario: The test verifies that the LiteLLMProvider passes a static API key to the completion call.
Why Needed: This test prevents regression in passing an API key through the completion call, ensuring consistent behavior with previous tests.
Confidence: 80%
Tokens: 384 input + 260 output = 644 total
Scenario: Test that the LiteLLM provider returns an authentication error when no refresher is configured.
Why Needed: This test prevents a bug where the provider does not raise an exception for authentication errors without token refresh.
Confidence: 80%
Tokens: 338 input + 141 output = 479 total
Scenario: Test that the LiteLLMProvider reports an authentication error when retrying after a second failure.
Why Needed: To prevent the test from passing if the LiteLLMProvider fails to report an authentication error on subsequent retries.
Confidence: 80%
Tokens: 419 input + 218 output = 637 total
Scenario: The test verifies that the LiteLLMProvider class handles a context too long error correctly.
Why Needed: This test prevents a bug where the provider throws an exception when given an invalid response containing a key assertion with no value.
Confidence: 80%
Tokens: 370 input + 144 output = 514 total
Scenario: tests/test_llm_providers.py::TestLiteLLMProvider::test_get_max_context_tokens_dict_format
Why Needed: To ensure that the LiteLLM provider correctly handles dict format from get_max_tokens.
Confidence: 80%
Tokens: 218 input + 84 output = 302 total
Scenario: tests/test_llm_providers.py
Why Needed: To ensure the LLM provider returns a valid JSON response when an error occurs.
Confidence: 80%
Tokens: 101 input + 98 output = 199 total
Scenario: tests/test_llm_providers.py::TestLiteLLMProvider::test_get_max_context_tokens_success
Why Needed: To test the get_max_context_tokens method of LiteLLMProvider.
Confidence: 80%
Tokens: 213 input + 99 output = 312 total
Scenario: tests/test_llm_providers.py::TestLiteLLMProvider::test_is_available_with_module
Why Needed: To ensure the LiteLLM provider can detect installed modules.
Confidence: 80%
Tokens: 160 input + 78 output = 238 total
Scenario: Test the LiteLLMProvider's token refresh integration.
Why Needed: The test prevents a potential bug where the TokenRefresher is not able to refresh tokens for a long time, causing the LLM provider to fail to authenticate.
Confidence: 80%
Tokens: 442 input + 222 output = 664 total
Scenario: The test verifies that the LiteLLMProvider retries transient errors and passes with the correct number of calls.
Why Needed: This test prevents a regression where the provider fails to retry on transient errors, potentially leading to unexpected behavior or failures in critical applications.
Confidence: 80%
Tokens: 426 input + 171 output = 597 total
Scenario: tests/test_llm_providers.py
Why Needed: To ensure that the LLM provider correctly handles context length errors during annotation.
Confidence: 80%
Tokens: 103 input + 79 output = 182 total
Scenario: Test OllamaProvider::test_annotate_handles_call_error verifies that the annotate method handles call errors by returning an appropriate error message.
Why Needed: This test prevents a regression where the annotation fails with a generic 'Failed after X retries. Last error:
Confidence: 80%
Tokens: 347 input + 193 output = 540 total
Scenario: The Ollama provider should report an error when the httpx dependency is missing.
Why Needed: This test prevents a bug where the provider incorrectly reports a missing dependency without providing any useful information.
Confidence: 80%
Tokens: 268 input + 105 output = 373 total
Scenario: tests/test_llm_providers.py
Why Needed: To ensure that the LLM provider can correctly annotate runtime errors and immediately fail.
Confidence: 80%
Tokens: 99 input + 162 output = 261 total
Scenario: Test the Ollama provider's full annotation flow with mocked HTTP responses.
Why Needed: Prevents authentication-related bugs in the Ollama provider.
Confidence: 80%
Tokens: 414 input + 190 output = 604 total
Scenario: Test that LiteLLM provider uses prompt_override when provided.
Why Needed: To ensure the correct behavior of the LiteLLM provider, where it overrides prompts with custom ones.
Confidence: 80%
Tokens: 373 input + 141 output = 514 total
Scenario: Tests the `annotate` method of `LiteLLMProvider` with token usage data.
Why Needed: Prevents regression in handling token usage from LiteLLM responses.
Confidence: 80%
Tokens: 426 input + 123 output = 549 total
Scenario: Test Ollama provider makes correct API call when calling OLLAMA successfully.
Why Needed: This test prevents regression where OLLAMA fails to make the API call with a valid response.
Confidence: 80%
Tokens: 470 input + 217 output = 687 total
Scenario: Ollama provider uses default model when not specified.
Why Needed: This test prevents a regression where the Ollama provider defaults to the 'llama3.2' model even if no model is provided in the config.
Confidence: 80%
Tokens: 344 input + 145 output = 489 total
Scenario: TestOllamaProvider::test_check_availability_failure
Why Needed: The test checks if the Ollama provider correctly returns False when the server is unavailable.
Confidence: 80%
Tokens: 183 input + 87 output = 270 total
Scenario: tests/test_llm_providers.py::TestOllamaProvider::test_check_availability_non_200
Why Needed: The test checks if the Ollama provider returns False for non-200 status codes.
Confidence: 80%
Tokens: 197 input + 104 output = 301 total
Scenario: The test verifies that the Ollama provider checks availability successfully by making a GET request to /api/tags.
Why Needed: This test prevents regression in case the API endpoint changes or is down, ensuring the Ollama provider can still function correctly.
Confidence: 80%
Tokens: 296 input + 113 output = 409 total
Scenario: tests/test_llm_providers.py
Why Needed: To ensure that the `get_max_context_tokens` method returns the correct context length key for a given scenario.
Confidence: 80%
Tokens: 99 input + 76 output = 175 total
Scenario: tests/test_llm_providers.py
Why Needed: To handle cases where the maximum context tokens are exceeded during training.
Confidence: 80%
Tokens: 101 input + 92 output = 193 total
Scenario: tests/test_llm_providers.py
Why Needed: To ensure that the `get_max_context_tokens` method returns the correct number of context tokens for a given model info.
Confidence: 80%
Tokens: 99 input + 84 output = 183 total
Scenario: Tests for LLM providers
Why Needed: To ensure the correct number of context tokens is returned from the `get_max_context_tokens_from_parameters` method.
Confidence: 80%
Tokens: 97 input + 97 output = 194 total
Scenario: tests/test_llm_providers.py::TestOllamaProvider::test_get_max_context_tokens_non_200_status
Why Needed: This test is needed because the `get_max_context_tokens` method returns a status of 200 when there are less than 200 context tokens, but it should return an error message instead.
Confidence: 80%
Tokens: 101 input + 140 output = 241 total
Scenario: tests/test_llm_providers.py::TestOllamaProvider::test_is_local_returns_true
Why Needed: To ensure the Ollama provider always returns `is_local=True`.
Confidence: 80%
Tokens: 123 input + 104 output = 227 total
Scenario: Ollama provider reports invalid JSON responses
Why Needed: To ensure the Ollama provider correctly handles and reports invalid JSON responses.
Confidence: 80%
Tokens: 138 input + 73 output = 211 total
Scenario: {'description': 'Test case for Ollama provider when invalid key_assertions are provided in the response data.', 'expected_output': {'error': 'Invalid response: key_assertions must be a list'}}
Why Needed: The test is necessary to ensure that the Ollama provider correctly handles invalid key_assertions payloads in its responses.
Confidence: 80%
Tokens: 174 input + 209 output = 383 total
Scenario: tests/test_llm_providers.py::TestOllamaProvider::test_parse_response_json_in_code_fence
Why Needed: To ensure that the Ollama provider correctly extracts JSON from markdown code fences.
Confidence: 80%
Tokens: 127 input + 121 output = 248 total
Scenario: tests/test_llm_providers.py::TestOllamaProvider::test_parse_response_json_in_plain_fence
Why Needed: to test the parsing of JSON responses from plain markdown fences (no language)
Confidence: 80%
Tokens: 128 input + 118 output = 246 total
Scenario: Test the Ollama provider's ability to parse valid JSON responses.
Why Needed: Prevents bugs in the LLM providers by ensuring that they correctly identify and extract relevant information from JSON responses.
Confidence: 80%
Tokens: 292 input + 113 output = 405 total
Scenario: Verify water-fill algorithm satisfies smaller files first.
Why Needed: This test prevents regression where the algorithm does not satisfy the constraint of distributing tokens to smaller files first due to insufficient budget.
Confidence: 80%
Tokens: 396 input + 136 output = 532 total
Scenario: tests/test_llm_utils.py::test_distribute_token_budget_empty
Why Needed: This test ensures that the `distribute_token_budget` function behaves correctly when given an empty input or no budget.
Confidence: 80%
Tokens: 115 input + 130 output = 245 total
Scenario: Verify fair sharing when neither fits.
Why Needed: Prevents regression where either L1 or L2 file gets more than half of the budget, leading to unfair token distribution.
Confidence: 80%
Tokens: 327 input + 170 output = 497 total
Scenario: tests/test_llm_utils.py::test_distribute_token_budget_max_files
Why Needed: Verify max_files limit.
Confidence: 80%
Tokens: 133 input + 110 output = 243 total
Scenario: Verify that the `distribute_token_budget` function allocates tokens to files in a sufficient manner.
Why Needed: This test prevents a potential bug where the token budget is insufficient, leading to incomplete or corrupted file contents.
Confidence: 80%
Tokens: 332 input + 116 output = 448 total
Scenario: Verify the rough token estimation (chars / 4) for an empty string.
Why Needed: Prevents a potential division by zero error when estimating tokens.
Confidence: 80%
Tokens: 217 input + 103 output = 320 total
Scenario: Test coverage entry serialization.
Why Needed: This test prevents a bug where the `CoverageEntry` object is not properly serialized to JSON.
Confidence: 80%
Tokens: 255 input + 123 output = 378 total
Scenario: Test that `CoverageEntry.to_dict()` returns the expected dictionary structure.
Why Needed: This test prevents a potential bug where the `CoverageEntry` object is not properly serialized to JSON.
Confidence: 80%
Tokens: 255 input + 136 output = 391 total
Scenario: Test coverage serialization for CoverageEntry.
Why Needed: This test prevents a potential bug where the coverage entry is not properly serialized to JSON.
Confidence: 80%
Tokens: 255 input + 177 output = 432 total
Scenario: An empty annotation should be created with default values.
Why Needed: This test prevents a regression where an empty annotation would result in a `NoneType` attribute.
Confidence: 80%
Tokens: 212 input + 124 output = 336 total
Scenario: The test verifies that the `LlmAnnotation` object can be serialized into a dictionary with required fields.
Why Needed: This test prevents regression by ensuring that the minimal annotation is properly serialized without any optional fields.
Confidence: 80%
Tokens: 230 input + 117 output = 347 total
Scenario: Test to dictionary with all fields
Why Needed: Prevents incorrect data representation in API responses
Confidence: 80%
Tokens: 284 input + 87 output = 371 total
Scenario: Test that the default report has a schema version and empty lists.
Why Needed: Prevents regression by ensuring the default report is correctly defined with required fields.
Confidence: 80%
Tokens: 231 input + 111 output = 342 total
Scenario: Test Report Root with collection errors should be verified.
Why Needed: This test prevents a regression where the report does not include all collection errors.
Confidence: 80%
Tokens: 237 input + 104 output = 341 total
Scenario: TestReportRoot::test_report_with_warnings
Why Needed: The test is necessary to ensure that the ReportWarning class correctly identifies and includes warnings in the report.
Confidence: 80%
Tokens: 144 input + 112 output = 256 total
Scenario: Tests should be sorted by nodeid in output.
Why Needed: Because the current implementation does not sort tests by nodeid, which can lead to incorrect test results.
Confidence: 80%
Tokens: 215 input + 120 output = 335 total
Scenario: The `to_dict()` method of the `ReportWarning` class is used to convert a warning object into a dictionary.
Why Needed: This test is needed because it checks if the detail attribute of the `ReportWarning` object is correctly populated in the returned dictionary.
Confidence: 80%
Tokens: 131 input + 116 output = 247 total
Scenario: Test to dictionary without detail should exclude it.
Why Needed: This test prevents a warning about missing detailed information in the report.
Confidence: 80%
Tokens: 223 input + 92 output = 315 total
Scenario: Test that RunMeta has aggregation fields.
Why Needed: Prevents regression where RunMeta is not aggregated by default.
Confidence: 80%
Tokens: 343 input + 127 output = 470 total
Scenario: Test that LLM fields are excluded when annotations are disabled.
Why Needed: This test prevents regression where LLMs are enabled but annotations are disabled, causing unexpected behavior.
Confidence: 80%
Tokens: 232 input + 105 output = 337 total
Scenario: Verify that LLM traceability fields are included when enabled.
Why Needed: Prevents regression in LLM model tracing functionality.
Confidence: 80%
Tokens: 327 input + 128 output = 455 total
Scenario: tests/test_models.py::TestRunMeta::test_non_aggregated_excludes_source_reports
Why Needed: It's necessary to ensure that non-aggregated reports do not include source_reports.
Confidence: 80%
Tokens: 130 input + 111 output = 241 total
Scenario: Test RunMeta to dict with all optional fields.
Why Needed: Prevents regression in case of missing or outdated plugin version, as it would lead to incorrect data being populated.
Confidence: 80%
Tokens: 483 input + 175 output = 658 total
Scenario: Test the RunMeta class to ensure it includes all necessary run status fields.
Why Needed: This test prevents a potential bug where the RunMeta object is missing certain critical fields that are required for proper functioning.
Confidence: 80%
Tokens: 285 input + 148 output = 433 total
Scenario: Test Schema Version Format
Why Needed: To ensure the schema version is in a valid semver format.
Confidence: 80%
Tokens: 115 input + 144 output = 259 total
Scenario: tests/test_models.py::TestSchemaVersion::test_schema_version_in_report_root
Why Needed: This test is necessary because the ReportRoot class does not include a schema version by default.
Confidence: 80%
Tokens: 119 input + 107 output = 226 total
Scenario: Test coverage entry serialization.
Why Needed: This test prevents a bug where the `CoverageEntry` class does not properly serialize its internal data to JSON.
Confidence: 80%
Tokens: 256 input + 134 output = 390 total
Scenario: The test verifies that the `to_dict` method of `LlmAnnotation` returns a dictionary with required fields.
Why Needed: This test prevents a potential bug where the minimal annotation is missing some required fields.
Confidence: 80%
Tokens: 229 input + 128 output = 357 total
Scenario: tests/test_models.py::TestSourceReport::test_to_dict_with_run_id
Why Needed: To ensure that the SourceReport object is correctly serializing its run_id attribute.
Confidence: 80%
Tokens: 134 input + 79 output = 213 total
Scenario: Test that `CoverageEntry.to_dict()` correctly serializes the test summary.
Why Needed: This test prevents a potential bug where the serialized test summary is not accurate due to incorrect formatting of line ranges.
Confidence: 80%
Tokens: 254 input + 174 output = 428 total
Scenario: Test that a minimal result has the required fields.
Why Needed: This test prevents regression where a minimal result is not provided with all necessary information.
Confidence: 80%
Tokens: 244 input + 117 output = 361 total
Scenario: tests/test_models.py::TestTestCaseResult::test_result_with_coverage verifies that the `result` dictionary contains a single 'coverage' key with a list of coverage entries.
Why Needed: This test prevents regression by ensuring that the `result` dictionary includes a 'coverage' key, which is necessary for calculating and displaying coverage statistics.
Confidence: 80%
Tokens: 256 input + 223 output = 479 total
Scenario: tests/test_models.py::TestTestCaseResult::test_result_with_llm_opt_out
Why Needed: To ensure that the LLM opt-out flag is correctly set in the test result.
Confidence: 80%
Tokens: 145 input + 90 output = 235 total
Scenario: Test case 'test_result_with_rerun' has been executed.
Why Needed: The test result is not being recorded in the database because reruns are disabled.
Confidence: 80%
Tokens: 162 input + 93 output = 255 total
Scenario: tests/test_models.py::TestTestCaseResult::test_result_without_rerun_excludes_fields
Why Needed: This test ensures that the `result` dictionary excludes fields related to reruns.
Confidence: 80%
Tokens: 152 input + 119 output = 271 total
Scenario: tests/test_models_coverage.py::TestReportRootToDict::test_to_dict_with_all_optional_fields
Why Needed: Prevents bar because llm_opt_out=True prevents the annotation from being generated for optional fields.
Confidence: 80%
Tokens: 454 input + 157 output = 611 total
LLM error: Failed to parse LLM response as JSON
Scenario: Test to_dict includes collection_errors when set.
Why Needed: This test prevents a regression where the to_dict method does not include collection_errors in the report.
Confidence: 80%
Tokens: 243 input + 92 output = 335 total
Scenario: Test to_dict includes custom_metadata when set.
Why Needed: Prevents regression in cases where custom metadata is required but not properly handled by the default to_dict method.
Confidence: 80%
Tokens: 264 input + 162 output = 426 total
Scenario: tests/test_models_coverage.py::TestReportRootToDict::test_to_dict_with_hmac_signature
Why Needed: to ensure that the `report.to_dict()` method includes an HMAC signature when it is set.
Confidence: 80%
Tokens: 128 input + 85 output = 213 total
Scenario: tests/test_models_coverage.py::TestReportRootToDict::test_to_dict_with_sha256
Why Needed: The test is necessary because the `to_dict` method of the ReportRoot class includes a SHA-256 hash when it is set.
Confidence: 80%
Tokens: 131 input + 92 output = 223 total
Scenario: Test to_dict includes source_coverage when set.
Why Needed: Prevents a potential bug where the test fails if the 'source_coverage' key is not present in the report dictionary, potentially leading to incorrect coverage analysis.
Confidence: 80%
Tokens: 282 input + 134 output = 416 total
Scenario: tests/test_models_coverage.py::TestReportRootToDict::test_to_dict_with_warnings
Why Needed: This test is necessary because the `to_dict()` method of ReportRoot includes warnings when set.
Confidence: 80%
Tokens: 151 input + 157 output = 308 total
Scenario: ...
Why Needed: ...
Confidence: 80%
Tokens: 153 input + 125 output = 278 total
Scenario: ...
Why Needed: ...
Confidence: 80%
Tokens: 131 input + 267 output = 398 total
Scenario: Test to_dict includes all optional fields when set.
Why Needed: This test prevents regression in coverage calculation when llm_opt_out is True.
Confidence: 80%
Tokens: 454 input + 193 output = 647 total
Scenario: tests/test_models_coverage.py::TestTestCaseResultToDict::test_to_dict_with_captured_stderr
Why Needed: to include captured_stderr in the JSON response when to_dict is used.
Confidence: 80%
Tokens: 149 input + 80 output = 229 total
Scenario: tests/test_models_coverage.py::TestTestCaseResultToDict::test_to_dict_with_captured_stdout
Why Needed: The `to_dict` method includes captured stdout when set.
Confidence: 80%
Tokens: 149 input + 78 output = 227 total
Scenario: tests/test_models_coverage.py::TestTestCaseResultToDict::test_to_dict_with_requirements
Why Needed: The `to_dict` method includes requirements when set. This is necessary because the `requirements` key in the test result dictionary is not a standard JSON key, but rather an attribute of the TestCaseResult class.
Confidence: 80%
Tokens: 151 input + 124 output = 275 total
Scenario: Verify that the default exclude globs are correctly set.
Why Needed: This test prevents a potential bug where the default exclude globs are not correctly set, potentially leading to unexpected behavior or errors.
Confidence: 80%
Tokens: 222 input + 161 output = 383 total
Scenario: Tests the default redact patterns configuration.
Why Needed: Prevents a potential security vulnerability where sensitive information like passwords and tokens are not properly redacted.
Confidence: 80%
Tokens: 228 input + 105 output = 333 total
Scenario: Test that default values are set correctly for the test_default_values scenario.
Why Needed: This test prevents a potential regression where the default values of the Config class are not set properly, potentially leading to unexpected behavior or errors in the application.
Confidence: 80%
Tokens: 318 input + 209 output = 527 total
Scenario: tests/test_options.py::TestConfig::test_get_default_config
Why Needed: To test the default configuration of the options.
Confidence: 80%
Tokens: 104 input + 84 output = 188 total
Scenario: Verify the is_llm_enabled check for different providers.
Why Needed: Prevents a potential bug where the test fails when using an unsupported provider.
Confidence: 80%
Tokens: 263 input + 112 output = 375 total
Scenario: test_validate_invalid_aggregate_policy
Why Needed: to test the validation of an invalid aggregation policy
Confidence: 80%
Tokens: 128 input + 110 output = 238 total
Scenario: tests/test_options.py::TestConfig::test_validate_invalid_context_mode
Why Needed: To ensure that the `validate()` method raises an error when a valid context mode is specified.
Confidence: 80%
Tokens: 131 input + 85 output = 216 total
Scenario: Tests for configuration options
Why Needed: To ensure that the `Config` class correctly validates an invalid provider.
Confidence: 80%
Tokens: 122 input + 68 output = 190 total
Scenario: Test validation of numeric constraints for TestConfig.
Why Needed: This test prevents a potential regression where the default values for LLM context bytes, max tests, requests per minute, timeout seconds, and max retries are not validated against their expected minimum or maximum values.
Confidence: 80%
Tokens: 329 input + 184 output = 513 total
Scenario: tests/test_options.py::TestConfig::test_validate_valid_config
Why Needed: To ensure that the `validate` method returns an empty list of errors when a valid configuration is passed.
Confidence: 80%
Tokens: 100 input + 85 output = 185 total
Scenario: Test loads aggregation options with correct directory, policy and run ID.
Why Needed: This test prevents a bug where the aggregate options are not loaded correctly due to incorrect or missing values in the mock configuration.
Confidence: 80%
Tokens: 295 input + 205 output = 500 total
Scenario: tests/test_options.py::TestLoadConfig::test_load_batch_flag_conflict
Why Needed: To test that the disabled batch flag works correctly.
Confidence: 80%
Tokens: 138 input + 90 output = 228 total
Scenario: Test handling when pyproject.toml doesn't exist.
Why Needed: Prevents regression in LLM configuration loading without a pyproject.toml file.
Confidence: 80%
Tokens: 413 input + 260 output = 673 total
Scenario: tests/test_options.py::TestLoadConfig::test_load_coverage_source
Why Needed: To test the coverage source option.
Confidence: 80%
Tokens: 126 input + 72 output = 198 total
Scenario: tests/test_options.py::TestLoadConfig::test_load_defaults
Why Needed: To test the default configuration when no options are set.
Confidence: 80%
Tokens: 116 input + 93 output = 209 total
Scenario: Test that CLI options override pyproject.toml options.
Why Needed: To test the ability of CLI options to override pyproject.toml settings.
Confidence: 80%
Tokens: 134 input + 131 output = 265 total
Scenario: Test that CLI provider option overrides pyproject.toml.
Why Needed: To ensure that the CLI provider option can override the default configuration in pyproject.toml, which is used to load dependencies.
Confidence: 80%
Tokens: 130 input + 114 output = 244 total
Scenario: tests/test_options.py::TestLoadConfig::test_load_from_cli_retries
Why Needed: To test the functionality of loading retries from the CLI.
Confidence: 80%
Tokens: 130 input + 86 output = 216 total
Scenario: Create a new directory with a pyproject.toml file
Why Needed: To test the functionality of loading values from a pyproject.toml file in an environment where the file does not exist.
Confidence: 80%
Tokens: 119 input + 87 output = 206 total
Scenario: Test loading token optimization options from CLI.
Why Needed: Prevents regression in token optimization configuration.
Confidence: 80%
Tokens: 264 input + 89 output = 353 total
Scenario: Verify that the test_cli_dependency_snapshot function correctly sets the dependency snapshot to 'deps.json' when an option is set.
Why Needed: This test prevents a potential regression where the CLI overrides for dependency snapshots are not being properly applied.
Confidence: 80%
Tokens: 213 input + 203 output = 416 total
Scenario: Testing the `test_cli_evidence_bundle` function to ensure it correctly sets the `llm_evidence_bundle` option to 'bundle.zip'.
Why Needed: This test prevents a potential bug where the `llm_evidence_bundle` option is not set correctly, potentially leading to incorrect evidence bundle reporting.
Confidence: 80%
Tokens: 217 input + 175 output = 392 total
Scenario: Verify that the `test_cli_report_json` test verifies that the `report_json` option is set to 'output.json' when CLI override for report JSON is enabled.
Why Needed: This test prevents a bug where the `report_json` option is not correctly overridden in the configuration, potentially leading to incorrect output.
Confidence: 80%
Tokens: 212 input + 207 output = 419 total
Scenario: Tests the CLI option to generate a report in PDF format.
Why Needed: Prevents regression where the test fails due to incorrect report PDF path.
Confidence: 80%
Tokens: 212 input + 184 output = 396 total
Scenario: test_validate_invalid_token_output_format
Why Needed: To ensure that the token output format is correctly validated and raises an error when it's invalid.
Confidence: 80%
Tokens: 130 input + 90 output = 220 total
Scenario: Test validation when token refresh interval is too short
Why Needed: Token refresh intervals should be at least 60 seconds to ensure sufficient time for the token to expire and be refreshed.
Confidence: 80%
Tokens: 146 input + 86 output = 232 total
Scenario: Test validation of valid LiteLLM configuration.
Why Needed: To ensure that the LiteLLM provider is correctly configured and validated without any errors.
Confidence: 80%
Tokens: 142 input + 91 output = 233 total
Scenario: test_load_aggregate_include_history
Why Needed: To ensure that the `aggregate_include_history` option is properly loaded and included in the generated code.
Confidence: 80%
Tokens: 118 input + 85 output = 203 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_aggregate_policy_from_pyproject
Why Needed: To ensure that the aggregate policy can be loaded from the PyProject.toml file.
Confidence: 80%
Tokens: 121 input + 121 output = 242 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_all_config_keys_combined
Why Needed: To ensure that all config keys are loaded when loading the entire pyproject.toml file.
Confidence: 80%
Tokens: 120 input + 136 output = 256 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_cache_dir
Why Needed: To ensure that the cache directory is loaded correctly from the pyproject.toml file.
Confidence: 80%
Tokens: 113 input + 89 output = 202 total
Scenario: Tests for PyProjectLoadingCoverage
Why Needed: To ensure that the cache TTL seconds are loaded correctly from the pyproject.toml file.
Confidence: 80%
Tokens: 116 input + 103 output = 219 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_capture_failed_output
Why Needed: To ensure that the `capture_failed_output` option in `pyproject.toml` is correctly loaded and used for coverage purposes.
Confidence: 80%
Tokens: 116 input + 140 output = 256 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_capture_output_max_chars
Why Needed: To ensure that the `capture_output_max_chars` option is properly loaded from the `pyproject.toml` file.
Confidence: 80%
Tokens: 119 input + 99 output = 218 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_context_bytes
Why Needed: To ensure that the context bytes are loaded correctly from the pyproject.toml file.
Confidence: 80%
Tokens: 113 input + 127 output = 240 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_context_exclude_globs
Why Needed: To ensure that the `context_exclude_globs` setting in `pyproject.toml` is properly excluded from coverage reports.
Confidence: 80%
Tokens: 119 input + 127 output = 246 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_context_file_limit
Why Needed: To ensure that the context file limit is correctly loaded from the pyproject.toml file.
Confidence: 80%
Tokens: 116 input + 138 output = 254 total
Scenario: Tests for `tests/test_options_coverage`
Why Needed: To ensure that the `context_include_globs` setting is correctly loaded from the `pyproject.toml` file.
Confidence: 80%
Tokens: 119 input + 103 output = 222 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_hmac_key_file
Why Needed: To ensure that the hmac_key_file is loaded correctly from the pyproject.toml file.
Confidence: 80%
Tokens: 118 input + 146 output = 264 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_include_param_values
Why Needed: To ensure that the `include_param_values` option is correctly loaded from the `pyproject.toml` file.
Confidence: 80%
Tokens: 116 input + 131 output = 247 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_include_phase
Why Needed: To ensure that the `include_phase` is loaded correctly from the PyProject.toml file.
Confidence: 80%
Tokens: 113 input + 150 output = 263 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_include_pytest_invocation
Why Needed: To ensure that the `include_pytest_invocation` option is correctly loaded from the PyProject.toml file.
Confidence: 80%
Tokens: 122 input + 101 output = 223 total
Scenario: Tests for pyproject.toml coverage
Why Needed: To ensure that the invocation_redact_patterns are correctly loaded and used in tests.
Confidence: 80%
Tokens: 121 input + 91 output = 212 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_litellm_api_base
Why Needed: To ensure that the litellm_api_base is loaded correctly from pyproject.toml.
Confidence: 80%
Tokens: 122 input + 129 output = 251 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_litellm_api_key
Why Needed: To ensure that the litellm API key is correctly loaded from the pyproject.toml file.
Confidence: 80%
Tokens: 122 input + 108 output = 230 total
Scenario: Loading litellm_token_json_key from pyproject.toml
Why Needed: To ensure that the litellm token JSON key is correctly loaded and used in the application.
Confidence: 80%
Tokens: 125 input + 151 output = 276 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_litellm_token_output_format
Why Needed: To ensure that the `litellm` package correctly handles token output formats in PyProject.toml.
Confidence: 80%
Tokens: 125 input + 114 output = 239 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_litellm_token_refresh_command
Why Needed: To ensure that the `litellm_token_refresh_command` is loaded correctly from the `pyproject.toml` file.
Confidence: 80%
Tokens: 125 input + 170 output = 295 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_litellm_token_refresh_interval
Why Needed: To ensure that the `litellm_token_refresh_interval` option is properly loaded from the PyProject.toml file, allowing for accurate coverage analysis.
Confidence: 80%
Tokens: 125 input + 107 output = 232 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_max_concurrency
Why Needed: To ensure that the `max_concurrency` setting in `pyproject.toml` is correctly loaded and used by the project.
Confidence: 80%
Tokens: 116 input + 169 output = 285 total
Scenario: Testing the ability to load max_tests from pyproject.toml
Why Needed: To ensure that the 'max_tests' setting in the pyproject.toml file can be loaded correctly and used by the tests.
Confidence: 80%
Tokens: 113 input + 119 output = 232 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_metadata_file
Why Needed: To ensure that the metadata file is loaded correctly from the pyproject.toml file.
Confidence: 80%
Tokens: 113 input + 118 output = 231 total
Scenario: test_pyproject_loading_coverage
Why Needed: To ensure that the ollama_host is loaded correctly from the pyproject.toml file.
Confidence: 80%
Tokens: 119 input + 104 output = 223 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_omit_tests_from_coverage
Why Needed: To ensure that the `omit_tests_from_coverage` option is correctly loaded from the `pyproject.toml` file.
Confidence: 80%
Tokens: 121 input + 108 output = 229 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_param_value_max_chars
Why Needed: To ensure that the `param_value_max_chars` option is correctly loaded from the `pyproject.toml` file and that its value is being used in the build process.
Confidence: 80%
Tokens: 119 input + 143 output = 262 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_report_collect_only
Why Needed: To ensure that the `report_collect_only` option is correctly loaded from the PyProject.toml file.
Confidence: 80%
Tokens: 116 input + 103 output = 219 total
Scenario: tests/test_options_coverage.py::TestPyprojectLoadingCoverage::test_load_timeout_seconds
Why Needed: To ensure that the timeout seconds are loaded correctly from pyproject.toml.
Confidence: 80%
Tokens: 113 input + 109 output = 222 total
Scenario: tests/test_options_coverage.py::TestPyprojectTokenOptimization::test_load_batch_max_tests
Why Needed: To ensure that the `batch_max_tests` option is correctly loaded from the PyProject.toml file and used to optimize Python packages.
Confidence: 80%
Tokens: 117 input + 156 output = 273 total
Scenario: test_load_batch_parametrized_tests
Why Needed: Optimization of PyProject token in batch parameterized tests
Confidence: 80%
Tokens: 123 input + 125 output = 248 total
Scenario: tests/test_options_coverage.py::TestPyprojectTokenOptimization::test_load_context_compression
Why Needed: To ensure that the context compression feature in Pytest is properly loaded and used.
Confidence: 80%
Tokens: 117 input + 127 output = 244 total
Scenario: tests/test_options_coverage.py::TestPyprojectTokenOptimization::test_load_context_line_padding
Why Needed: To ensure that the context line padding is correctly loaded and applied in the build process.
Confidence: 80%
Tokens: 117 input + 135 output = 252 total
Scenario: tests/test_options_coverage.py::TestPyprojectTokenOptimization::test_load_prompt_tier
Why Needed: To ensure that the `prompt_tier` is correctly loaded from the `pyproject.toml` file and used to determine the token optimization strategy.
Confidence: 80%
Tokens: 117 input + 148 output = 265 total
Scenario: tests/test_options_coverage.py::TestValidationCoverageExtended::test_validate_batch_max_tests_too_small
Why Needed: Because the `batch_max_tests` configuration option is not being used effectively.
Confidence: 80%
Tokens: 135 input + 113 output = 248 total
Scenario: tests/test_options_coverage.py::TestValidationCoverageExtended::test_validate_context_line_padding_negative
Why Needed: Negative context_line_padding is not allowed.
Confidence: 80%
Tokens: 129 input + 68 output = 197 total
Scenario: tests/test_options_coverage.py::TestValidationCoverageExtended::test_validate_invalid_context_compression
Why Needed: To ensure that the validation of context compression settings does not fail when an invalid value is provided.
Confidence: 80%
Tokens: 124 input + 88 output = 212 total
Scenario: test_validate_invalid_prompt_tier
Why Needed: To ensure that the `validate()` method correctly identifies and reports invalid `prompt_tier` values.
Confidence: 80%
Tokens: 125 input + 99 output = 224 total
Scenario: tests/test_plugin_integration.py::TestPluginConfigLoading::test_config_defaults
Why Needed: To ensure that the plugin configuration has safe defaults.
Confidence: 80%
Tokens: 119 input + 72 output = 191 total
Scenario: tests/test_plugin_integration.py::TestPluginConfigLoading::test_markers_exist_in_config
Why Needed: The test checks if markers exist in the plugin configuration.
Confidence: 80%
Tokens: 108 input + 74 output = 182 total
Scenario: Test generates both JSON and HTML reports for a test function.
Why Needed: This test prevents regression in cases where the plugin is used with both JSON and HTML output formats.
Confidence: 80%
Tokens: 279 input + 186 output = 465 total
Scenario: tests/test_plugin_integration.py::TestPluginHooksWithPytester::test_collection_finish_counts_items
Why Needed: pytest_collection_finish counts items (line 378)
Confidence: 80%
Tokens: 198 input + 96 output = 294 total
Scenario: The test verifies that the `test_pass()` function is executed and a new directory 'nested' with 'dir' as its parent directory is created.
Why Needed: This test prevents regression in cases where the plugin integration fails to create output directories.
Confidence: 80%
Tokens: 247 input + 167 output = 414 total
Scenario: Test that fixture errors are captured in report.
Why Needed: Fixture failures are not properly reported, leading to incorrect error counts and debugging issues.
Confidence: 80%
Tokens: 286 input + 113 output = 399 total
Scenario: Test pytest_runtest_makereport captures outcomes to verify that it correctly identifies all test outcomes.
Why Needed: pytest_runtest_makereport prevents regression by ensuring that the plugin correctly identifies and reports all test outcomes, including skipped tests.
Confidence: 80%
Tokens: 335 input + 165 output = 500 total
Scenario: tests/test_plugin_integration.py::TestPluginHooksWithPytester::test_no_report_when_disabled
Why Needed: To ensure that the plugin correctly handles cases where no output is specified.
Confidence: 80%
Tokens: 150 input + 88 output = 238 total
Scenario: Test that the `--llm-pdf` option enables the plugin and triggers its logic.
Why Needed: Prevents regression in plugin integration where --llm-pdf is used without enabling the plugin.
Confidence: 80%
Tokens: 435 input + 113 output = 548 total
Scenario: Test that pytest_sessionstart records start time is verified by Pytester.
Why Needed: This test prevents a potential bug where the start time of the session is not recorded correctly.
Confidence: 80%
Tokens: 276 input + 205 output = 481 total
Scenario: tests/test_plugin_integration.py
Why Needed: The requirement marker is used to mark requirements as having a plugin. This helps in identifying which requirements require plugins and can be useful for testing purposes.
Confidence: 80%
Tokens: 90 input + 88 output = 178 total
Scenario: Test the integration of report writer with pytest_llm_report.
Why Needed: This test prevents regression when integrating report writer with pytest_llm_report, as it ensures that all tests are properly formatted and include required information for a full report.
Confidence: 80%
Tokens: 417 input + 179 output = 596 total
Scenario: TestPluginCollectReport
Why Needed: To test the collectreport functionality when it is enabled.
Confidence: 80%
Tokens: 204 input + 68 output = 272 total
Scenario: tests/test_plugin_maximal.py::TestPluginCollectReport::test_pytest_collectreport_no_session
Why Needed: To ensure that collectreport does not throw an exception when a session is not available.
Confidence: 80%
Tokens: 138 input + 81 output = 219 total
Scenario: tests/test_plugin_maximal.py::TestPluginCollectReport::test_pytest_collectreport_session_none
Why Needed: To ensure that the collectreport plugin behaves correctly when a Pytest session is None.
Confidence: 80%
Tokens: 134 input + 115 output = 249 total
Scenario: tests/test_plugin_maximal.py::TestPluginConfigure::test_pytest_configure_llm_enabled_warning
Why Needed: LLM enabled warning is raised when pytest is run with the --llm flag.
Confidence: 80%
Tokens: 143 input + 115 output = 258 total
Scenario: tests/test_plugin_maximal.py::TestPluginConfigure::test_pytest_configure_validation_errors
Why Needed: Validation errors are raised when the pytest configuration is invalid.
Confidence: 80%
Tokens: 134 input + 87 output = 221 total
Scenario: TestPluginConfigure::test_pytest_configure_worker_skip
Why Needed: To ensure that the configure function skips on xdist workers correctly.
Confidence: 80%
Tokens: 170 input + 72 output = 242 total
Scenario: Test that fallback to load_config is triggered when Config.load is missing.
Why Needed: To prevent regression where Config.load is missing, and the plugin falls back to load_config.
Confidence: 80%
Tokens: 747 input + 179 output = 926 total
Scenario: tests/test_plugin_maximal.py::TestPluginLoadConfig::test_load_config_cli_overrides_pyproject
Why Needed: To test the plugin's ability to load configuration files with CLI options overriding those in pyproject.toml.
Confidence: 80%
Tokens: 140 input + 159 output = 299 total
Scenario: tests/test_plugin_maximal.py::TestPluginLoadConfig::test_load_config_from_pyproject
Why Needed: To ensure that the plugin can load configuration files from the PyPI repository.
Confidence: 80%
Tokens: 136 input + 105 output = 241 total
Scenario: Test that terminal summary skips when plugin is disabled.
Why Needed: Prevents a regression where the plugin's terminal summary might be incorrectly reported as enabled even though it's not.
Confidence: 80%
Tokens: 281 input + 179 output = 460 total
Scenario: tests/test_plugin_maximal.py::TestPluginMaximal::test_terminal_summary_worker_skip
Why Needed: To test that terminal summary skips on xdist worker.
Confidence: 80%
Tokens: 164 input + 74 output = 238 total
Scenario: Test config loading from pytest objects (CLI) to ensure the correct value is set for llm_report_html.
Why Needed: This test prevents a potential bug where the correct value for llm_report_html is not being set, potentially leading to incorrect configuration output.
Confidence: 80%
Tokens: 639 input + 587 output = 1226 total
Scenario: tests/test_plugin_maximal.py::TestPluginRuntest::test_runtest_makereport_disabled
Why Needed: The test is failing because the makereport hookwrapper is not completing successfully.
Confidence: 80%
Tokens: 220 input + 86 output = 306 total
Scenario: Test that makereport calls collector when enabled.
Why Needed: This test prevents a potential regression where the plugin does not report any errors even if makereport is called.
Confidence: 80%
Tokens: 371 input + 317 output = 688 total
Scenario: tests/test_plugin_maximal.py::TestPluginSessionHooks::test_pytest_collection_finish_disabled
Why Needed: This test is needed because the pytest_collection_finish function should skip collection finish when disabled.
Confidence: 80%
Tokens: 149 input + 103 output = 252 total
Scenario: TestPluginSessionHooks
Why Needed: To ensure that the `pytest_collection_finish` function calls the `_collector_key` collector when collection finish is enabled.
Confidence: 80%
Tokens: 219 input + 134 output = 353 total
Scenario: TestPluginSessionHooks
Why Needed: To ensure that the plugin correctly handles session start when disabled.
Confidence: 80%
Tokens: 157 input + 76 output = 233 total
Scenario: Test that sessionstart initializes collector when enabled and creates a stash with both get() and [] methods.
Why Needed: This test prevents a potential regression where the collector is not created or does not have access to the stash, potentially leading to incorrect data collection or other issues.
Confidence: 80%
Tokens: 335 input + 198 output = 533 total
Scenario: Test pytest_addoption adds expected arguments and verifies specific options.
Why Needed: pytest_addoption prevents a potential bug where the plugin does not add all required arguments to the parser.
Confidence: 80%
Tokens: 293 input + 117 output = 410 total
Scenario: tests/test_plugin_maximal.py::TestPluginTerminalSummary::test_pytest_addoption_no_ini
Why Needed: pytest_addoption no longer adds INI options
Confidence: 80%
Tokens: 140 input + 85 output = 225 total
Scenario: Test coverage percentage calculation logic for terminal summary.
Why Needed: Prevents regression in coverage reporting when terminal summary is enabled.
Confidence: 80%
Tokens: 395 input + 203 output = 598 total
Scenario: Test terminal summary with LLM enabled runs annotations.
Why Needed: Prevents regression by ensuring that the plugin is correctly configured when LLM is enabled.
Confidence: 80%
Tokens: 477 input + 152 output = 629 total
Scenario: Test terminal summary creates collector if missing.
Why Needed: The test prevents a potential bug where the plugin does not create a collector even when it is supposed to be present in the configuration.
Confidence: 80%
Tokens: 391 input + 157 output = 548 total
Scenario: Test terminal summary with aggregation enabled.
Why Needed: This test prevents regression in the case where aggregation is enabled and there are multiple terminals being reported.
Confidence: 80%
Tokens: 441 input + 159 output = 600 total
Scenario: Test coverage calculation error when loading coverage map.
Why Needed: This test prevents regression where the coverage calculation fails due to an OSError during load.
Confidence: 80%
Tokens: 389 input + 145 output = 534 total
Scenario: Tests the ContextAssembler with a balanced context configuration to ensure it correctly includes dependencies and passes coverage tests.
Why Needed: This test prevents regression by ensuring that the ContextAssembler correctly assembles a balanced context, including all necessary dependencies.
Confidence: 80%
Tokens: 331 input + 122 output = 453 total
Scenario: tests/test_prompts.py::TestContextAssembler::test_assemble_complete_context
Why Needed: To test the ContextAssembler's ability to assemble a complete context for a test file.
Confidence: 80%
Tokens: 176 input + 87 output = 263 total
Scenario: Test the ContextAssembler with minimal context mode and a test file.
Why Needed: This test prevents regression when using minimal context mode without specifying a repository root.
Confidence: 80%
Tokens: 267 input + 94 output = 361 total
Scenario: Test the ContextAssembler with balanced context limits to ensure it does not truncate long content within a file.
Why Needed: This test prevents bugs that may occur when the ContextAssembler is used with large files, causing the context to be truncated unnecessarily.
Confidence: 80%
Tokens: 335 input + 147 output = 482 total
Scenario: Test that 'complete' mode does not truncate long files despite a small llm_context_bytes limit.
Why Needed: This test prevents a regression where the LLM context size exceeds the file content size, causing truncation of long files in complete mode.
Confidence: 80%
Tokens: 361 input + 150 output = 511 total
Scenario: Verify the correct handling of non-existent files and nested test names with parameters.
Why Needed: This test prevents a potential bug where the ContextAssembler incorrectly handles cases where the test file does not exist or has nested test names with parameters.
Confidence: 80%
Tokens: 275 input + 135 output = 410 total
Scenario: The test verifies that the ContextAssembler should exclude certain Python files and directories from being processed.
Why Needed: This test prevents a potential bug where the ContextAssembler incorrectly includes certain files or directories in its processing, leading to unexpected behavior or errors.
Confidence: 80%
Tokens: 227 input + 114 output = 341 total
Scenario: Test assemble minimal mode returns no context files.
Why Needed: To prevent a regression where the assemble function does not generate any context files when run in minimal mode.
Confidence: 80%
Tokens: 298 input + 78 output = 376 total
Scenario: Test assemble respects llm_context_override from test.
Why Needed: This test prevents regression by ensuring the ContextAssembler uses the correct mode when overriding LLM context.
Confidence: 80%
Tokens: 362 input + 166 output = 528 total
Scenario: Test 'test_balanced_context_excludes_patterns' verifies that a balanced context excludes files matching exclude patterns.
Why Needed: This test prevents regression where the LLM context mode is set to 'balanced', and it includes files in the context that match exclude patterns.
Confidence: 80%
Tokens: 331 input + 254 output = 585 total
Scenario: tests/test_prompts_coverage.py::TestContextAssemblerEdgeCases::test_balanced_context_file_not_exists
Why Needed: To ensure that the ContextAssembler correctly handles cases where a balanced context file is not found.
Confidence: 80%
Tokens: 201 input + 96 output = 297 total
Scenario: Test that balanced context respects max bytes limit.
Why Needed: This test prevents a potential bug where the LLM context exceeds the maximum allowed bytes, causing truncated content.
Confidence: 80%
Tokens: 405 input + 180 output = 585 total
Scenario: tests/test_prompts_coverage.py::TestContextAssemblerEdgeCases::test_balanced_context_no_coverage
Why Needed: To ensure that the ContextAssembler can correctly assemble a balanced context with no coverage.
Confidence: 80%
Tokens: 162 input + 86 output = 248 total
Scenario: Test that loop exits when max bytes is reached before processing file.
Why Needed: Prevents a potential memory leak by ensuring the context assembler does not exceed the maximum allowed bytes before processing files.
Confidence: 80%
Tokens: 409 input + 110 output = 519 total
Scenario: tests/test_prompts_coverage.py::TestContextAssemblerEdgeCases::test_complete_context_delegates_to_balanced
Why Needed: To ensure that complete context delegates to balanced correctly.
Confidence: 80%
Tokens: 211 input + 87 output = 298 total
Scenario: Test _get_test_source with empty nodeid returns empty string
Why Needed: To ensure that the ContextAssembler correctly handles an empty node ID in the test source.
Confidence: 80%
Tokens: 148 input + 74 output = 222 total
Scenario: tests/test_prompts_coverage.py::TestContextAssemblerEdgeCases::test_get_test_source_extraction_stops_at_next_def
Why Needed: To ensure that source extraction stops at the next function definition, even if there are multiple definitions in a single file.
Confidence: 80%
Tokens: 129 input + 100 output = 229 total
Scenario: Edge Case: Test Source File Not Exists
Why Needed: The test assembly function `_get_test_source` should handle cases where the test source file does not exist.
Confidence: 80%
Tokens: 142 input + 90 output = 232 total
Scenario: tests/test_prompts_coverage.py::TestContextAssemblerEdgeCases::test_get_test_source_with_class
Why Needed: To ensure that the _get_test_source function correctly extracts functions with proper indentation, even when they are nested within other code blocks.
Confidence: 80%
Tokens: 118 input + 174 output = 292 total
Scenario: tests/test_ranges.py::TestCompressRanges::test_consecutive_lines
Why Needed: To ensure that consecutive lines are compressed into a single range.
Confidence: 80%
Tokens: 106 input + 72 output = 178 total
Scenario: tests/test_ranges.py::TestCompressRanges::test_duplicates
Why Needed: To test the handling of duplicate ranges in the compress_ranges function.
Confidence: 80%
Tokens: 107 input + 93 output = 200 total
Scenario: tests/test_ranges.py::TestCompressRanges::test_empty_list
Why Needed: Because an empty list is considered a valid input for the `compress_ranges` function.
Confidence: 80%
Tokens: 92 input + 70 output = 162 total
Scenario: tests/test_ranges.py::TestCompressRanges::test_mixed_ranges
Why Needed: To test the functionality of compressing mixed ranges in a list.
Confidence: 80%
Tokens: 130 input + 95 output = 225 total
Scenario: tests/test_ranges.py::TestCompressRanges::test_non_consecutive_lines
Why Needed: To ensure that non-consecutive lines are correctly compressed to a single comma-separated value.
Confidence: 80%
Tokens: 113 input + 88 output = 201 total
Scenario: tests/test_ranges.py::TestCompressRanges::test_single_line
Why Needed: The single line should be compressed using the range notation.
Confidence: 80%
Tokens: 96 input + 66 output = 162 total
Scenario: tests/test_ranges.py::TestCompressRanges::test_two_consecutive
Why Needed: The test is necessary because the current implementation does not handle two consecutive lines correctly.
Confidence: 80%
Tokens: 103 input + 76 output = 179 total
Scenario: tests/test_ranges.py::TestCompressRanges::test_unsorted_input
Why Needed: The test is necessary to ensure that the `compress_ranges` function can handle unsorted input correctly.
Confidence: 80%
Tokens: 110 input + 86 output = 196 total
Scenario: tests/test_ranges.py::TestExpandRanges::test_empty_string
Why Needed: The current implementation does not handle empty strings correctly.
Confidence: 80%
Tokens: 90 input + 60 output = 150 total
Scenario: tests/test_ranges.py::TestExpandRanges::test_mixed
Why Needed: The test is necessary because it checks for the correct expansion of mixed ranges and singles.
Confidence: 80%
Tokens: 121 input + 113 output = 234 total
Scenario: tests/test_ranges.py::TestExpandRanges::test_range
Why Needed: The range function is not correctly expanding the input string.
Confidence: 80%
Tokens: 99 input + 99 output = 198 total
Scenario: compress_ranges and expand_ranges should be inverses.
Why Needed: This test ensures that the `compress_ranges` and `expand_ranges` functions are inverse operations, meaning they can be used to reconstruct the original list from a compressed representation.
Confidence: 80%
Tokens: 134 input + 116 output = 250 total
Scenario: tests/test_ranges.py::TestExpandRanges::test_single_number
Why Needed: To ensure that the `expand_ranges` function correctly handles a single number as input.
Confidence: 80%
Tokens: 95 input + 85 output = 180 total
Scenario: tests/test_render.py::TestFormatDuration::test_milliseconds verifies that the function correctly formats durations in milliseconds for times less than 1 second.
Why Needed: This test prevents a potential bug where the function does not format durations as expected for times less than 1 second, potentially leading to incorrect rendering of time-related content.
Confidence: 80%
Tokens: 211 input + 274 output = 485 total
Scenario: tests/test_render.py::TestFormatDuration::test_seconds
Why Needed: To ensure the function `format_duration` correctly formats time durations in seconds.
Confidence: 80%
Tokens: 116 input + 94 output = 210 total
Scenario: Test Outcome Mapping to CSS Classes
Why Needed: To ensure that all outcomes are correctly mapped to their corresponding CSS classes.
Confidence: 80%
Tokens: 263 input + 116 output = 379 total
Scenario: tests/test_render.py::TestOutcomeToCssClass::test_unknown_outcome
Why Needed: The test is necessary because it checks for the default CSS class when an unknown outcome is encountered.
Confidence: 80%
Tokens: 102 input + 91 output = 193 total
Scenario: The test verifies that a complete HTML document is rendered with the expected report content.
Why Needed: This test prevents a potential rendering issue where the report might not be displayed correctly due to missing or incorrect HTML elements.
Confidence: 80%
Tokens: 426 input + 149 output = 575 total
Scenario: Test renders coverage for fallback HTML test.
Why Needed: Prevents regression and ensures accurate coverage reporting.
Confidence: 80%
Tokens: 288 input + 90 output = 378 total
Scenario: tests/test_render.py::TestRenderFallbackHtml::test_renders_llm_annotation
Why Needed: This test prevents the rendering of LLM annotations with a low confidence score, which could be misleading and potentially lead to security vulnerabilities.
Confidence: 80%
Tokens: 317 input + 139 output = 456 total
Scenario: Test renders source coverage for fallback HTML.
Why Needed: Prevents a regression where the source coverage summary is not displayed correctly when using fallback HTML.
Confidence: 80%
Tokens: 331 input + 254 output = 585 total
Scenario: Test renders xpass summary for ReportRoot report.
Why Needed: This test prevents a regression where the 'xfailed/xpassed' summary is not rendered correctly when there are multiple failed and passed tests.
Confidence: 80%
Tokens: 283 input + 113 output = 396 total
Scenario: tests/test_report_writer.py::TestComputeSha256::test_different_content
Why Needed: To ensure that different content produces different hashes.
Confidence: 80%
Tokens: 115 input + 111 output = 226 total
Scenario: tests/test_report_writer.py::TestComputeSha256::test_empty_bytes
Why Needed: To ensure that the test suite is robust and can handle empty input data.
Confidence: 80%
Tokens: 129 input + 98 output = 227 total
Scenario: Test 'Run meta should include version info' verifies that the test report writer correctly includes version information in the build run metadata.
Why Needed: This test prevents regression where the test report writer does not include version information in the build run metadata, potentially leading to incorrect reporting or analysis of test results.
Confidence: 80%
Tokens: 318 input + 131 output = 449 total
Scenario: Test verifies that the `build_summary` method counts all outcome types correctly.
Why Needed: This test prevents a regression where the summary does not include all outcome types, potentially leading to incorrect reporting.
Confidence: 80%
Tokens: 336 input + 152 output = 488 total
Scenario: Test that the `build_summary_counts` method correctly counts outcomes in a test report.
Why Needed: This test prevents regression where the total count of passed, failed and skipped tests is not updated correctly.
Confidence: 80%
Tokens: 283 input + 165 output = 448 total
Scenario: Testing the creation of a ReportWriter instance with a valid configuration.
Why Needed: This test prevents potential bugs where a new ReportWriter instance is created without properly initializing its configuration.
Confidence: 80%
Tokens: 199 input + 117 output = 316 total
Scenario: Test writes a report with all assembled tests.
Why Needed: This test prevents regression where the report does not include all tests, potentially leading to incorrect reporting or missing important information.
Confidence: 80%
Tokens: 255 input + 99 output = 354 total
Scenario: tests/test_report_writer.py::TestReportWriter::test_write_report_includes_coverage_percent
Why Needed: To ensure that the ReportWriter class correctly calculates and returns the total coverage percentage in the report.
Confidence: 80%
Tokens: 132 input + 108 output = 240 total
Scenario: Test ReportWriter::test_write_report_includes_source_coverage verifies that the test writes a report with source coverage summary.
Why Needed: This test prevents regression where the report does not include source coverage information, which is crucial for debugging and tracking changes in codebase.
Confidence: 80%
Tokens: 291 input + 274 output = 565 total
Scenario: Test ReportWriter::test_write_report_merges_coverage verifies that the test writes a merged coverage report.
Why Needed: This test prevents regression where the coverage is not properly merged into tests, potentially leading to inaccurate reporting.
Confidence: 80%
Tokens: 285 input + 110 output = 395 total
Scenario: Test that the ReportWriterWithFiles class falls back to direct write if atomic write fails and writes warnings.
Why Needed: This test prevents a regression where the ReportWriterWithFiles class does not fall back to direct write when an atomic write operation fails, potentially leading to incorrect report generation or data loss.
Confidence: 80%
Tokens: 276 input + 133 output = 409 total
Scenario: Test case 'tests/test_report_writer.py::TestReportWriterWithFiles::test_creates_directory_if_missing'
Why Needed: Because the test writer does not create an output directory if it already exists.
Confidence: 80%
Tokens: 171 input + 109 output = 280 total
Scenario: Test that a directory creation failure prevents the capture of a warning code 'W201' from ReportWriter._ensure_dir.
Why Needed: This test prevents a regression where the report writer does not capture warnings when creating directories with permission issues.
Confidence: 80%
Tokens: 278 input + 129 output = 407 total
Scenario: Test 'test_git_info_failure' verifies that the `get_git_info` function handles git command failures gracefully by returning `None` for both SHA and dirty flag values.
Why Needed: This test prevents a regression where the `get_git_info` function fails to return expected values when it encounters a git command failure.
Confidence: 80%
Tokens: 231 input + 262 output = 493 total
Scenario: Test verifies that the report writer creates an HTML file with expected content.
Why Needed: This test prevents a regression where the report writer does not create an HTML file even if there are tests that fail or are skipped.
Confidence: 80%
Tokens: 366 input + 149 output = 515 total
Scenario: Test verifies that xfail outcomes are included in the HTML summary.
Why Needed: This test prevents regression where xfail results are not included in the report.
Confidence: 80%
Tokens: 308 input + 148 output = 456 total
Scenario: Test verifies that a JSON file is created with the report.
Why Needed: This test prevents regression where the report writer does not create a JSON file.
Confidence: 80%
Tokens: 265 input + 118 output = 383 total
Scenario: Test verifies that the `write_pdf` method creates a PDF file when Playwright is available.
Why Needed: This test prevents regression where the `report_writer` module does not create a PDF file when Playwright is installed.
Confidence: 80%
Tokens: 478 input + 186 output = 664 total
Scenario: Test should warn when Playwright is missing for PDF output.
Why Needed: To prevent a warning about missing Playwright for PDF output, which may indicate an issue with the test environment.
Confidence: 80%
Tokens: 311 input + 95 output = 406 total
Scenario: tests/test_report_writer_coverage.py::TestGetGitInfo::test_git_info_from_nonexistent_path
Why Needed: To test that the report writer does not attempt to write to a non-existent Git directory.
Confidence: 80%
Tokens: 123 input + 139 output = 262 total
Scenario: tests/test_report_writer_coverage.py::TestGetGitInfo::test_git_info_from_valid_repo
Why Needed: To ensure that the `get_git_info` function returns a valid git SHA for a valid repository.
Confidence: 80%
Tokens: 159 input + 103 output = 262 total
Scenario: Test falls back to git runtime when _git_info import fails.
Why Needed: Prevents a regression where the plugin's Git info cannot be retrieved due to an import failure.
Confidence: 80%
Tokens: 253 input + 199 output = 452 total
Scenario: test_get_plugin_git_info
Why Needed: to ensure plugin git info returns some values
Confidence: 80%
Tokens: 141 input + 84 output = 225 total
Scenario: Test atomic write fallback
Why Needed: To ensure that the ReportWriter can handle unexpected errors during an atomic write operation.
Confidence: 80%
Tokens: 181 input + 112 output = 293 total
Scenario: Test PDF generation with playwright exception when browser launch fails.
Why Needed: Prevents test from passing if playwright raises an exception during PDF generation.
Confidence: 80%
Tokens: 356 input + 149 output = 505 total
Scenario: Test PDF generation when playwright is not installed.
Why Needed: Prevents a potential bug where the report writer fails to create a PDF file due to playwright being missing.
Confidence: 80%
Tokens: 293 input + 101 output = 394 total
Scenario: Verify that _resolve_pdf_html_source creates a temporary file when no HTML source is provided.
Why Needed: Prevents regression where the test fails due to missing or empty HTML source configuration.
Confidence: 80%
Tokens: 265 input + 115 output = 380 total
Scenario: Verify that the test resolves an HTML source when a missing HTML file exists.
Why Needed: Prevents regression in case of missing HTML files, ensuring correct PDF generation.
Confidence: 80%
Tokens: 270 input + 145 output = 415 total
Scenario: The test verifies that the _resolve_pdf_html_source method uses an existing HTML file as its source.
Why Needed: This test prevents a potential bug where the method does not find any existing HTML files and therefore cannot resolve the PDF.
Confidence: 80%
Tokens: 266 input + 142 output = 408 total
Scenario: Test that `AnnotationSchema.from_dict` can create a full annotation from a dictionary with all required fields.
Why Needed: Prevents regression in case of missing required fields, ensuring the validation logic works correctly.
Confidence: 80%
Tokens: 276 input + 115 output = 391 total
Scenario: test_to_dict_full verifies that the AnnotationSchema can convert to a dictionary with all required fields.
Why Needed: This test prevents regression bugs in the AnnotationSchema where it may not be able to generate a full dictionary representation of an annotation.
Confidence: 80%
Tokens: 273 input + 128 output = 401 total
Scenario: The HTML report is generated correctly.
Why Needed: Prevents a potential issue where the test does not produce an HTML report even if the function `test_simple` passes.
Confidence: 80%
Tokens: 264 input + 147 output = 411 total
Scenario: test_html_summary_counts_all_statuses verifies that HTML summary counts include all statuses.
Why Needed: This test prevents regression where the HTML summary counts do not include all statuses, such as when there are multiple failed tests or skipped tests.
Confidence: 80%
Tokens: 621 input + 242 output = 863 total
Scenario: The JSON report is created and contains the expected schema version, summary statistics, and test counts.
Why Needed: This test prevents a regression where the report generation process fails to create a valid JSON file with the required metadata.
Confidence: 80%
Tokens: 295 input + 180 output = 475 total
Scenario: Verify that LLM annotations are included in the report when a provider is enabled.
Why Needed: Prevent regressions by ensuring LLM annotations are present in the report.
Confidence: 80%
Tokens: 385 input + 246 output = 631 total
Scenario: Test that LLM errors are surfaced in HTML output.
Why Needed: This test prevents a regression where LLM errors might not be reported in the expected format.
Confidence: 80%
Tokens: 313 input + 198 output = 511 total
Scenario: Verify that the LLM opt-out marker is correctly recorded in the test report.
Why Needed: This test prevents regression where a test might not record the LLM opt-out marker due to a missing or incorrect configuration.
Confidence: 80%
Tokens: 290 input + 140 output = 430 total
Scenario: Test the requirement marker functionality.
Why Needed: This test prevents a potential bug where the requirement marker is not recorded correctly, potentially leading to incorrect reporting or analysis of tests.
Confidence: 80%
Tokens: 307 input + 145 output = 452 total
Scenario: Test that multiple xfailed tests are recorded in the report.
Why Needed: This test prevents regression by ensuring that all xfailed tests are properly reported and counted.
Confidence: 80%
Tokens: 317 input + 136 output = 453 total
Scenario: Test that skipping tests prevents the 'skip' marker from being recorded.
Why Needed: This test prevents regression in case of a skipped test, ensuring that the expected number of skipped tests is reported correctly.
Confidence: 80%
Tokens: 264 input + 206 output = 470 total
Scenario: Test that xfailed tests are recorded and reported correctly.
Why Needed: This test prevents regression in the reporting of failed tests, ensuring accurate tracking of xfailed tests.
Confidence: 80%
Tokens: 264 input + 192 output = 456 total
Scenario: Test Parametrized Tests: Verify that parameterized tests are recorded separately and their results are reported correctly.
Why Needed: This test prevents regression by ensuring that the same test is run multiple times with different inputs, which can lead to false negatives if the test is not properly configured.
Confidence: 80%
Tokens: 290 input + 192 output = 482 total
Scenario: tests/test_smoke_pytester.py::TestPluginRegistration::test_help_contains_examples
Why Needed: This test is necessary to ensure that the CLI help text includes usage examples for the plugin registration feature.
Confidence: 80%
Tokens: 123 input + 106 output = 229 total
Scenario: TestPluginRegistration
Why Needed: To verify that LLM markers are registered.
Confidence: 80%
Tokens: 142 input + 115 output = 257 total
Scenario: tests/test_smoke_pytester.py::TestPluginRegistration::test_plugin_registered
Why Needed: To verify that the plugin is registered correctly.
Confidence: 80%
Tokens: 118 input + 77 output = 195 total
Scenario: Verify that special characters in nodeid do not cause Pytest to crash or produce invalid HTML reports.
Why Needed: This test prevents a potential regression where special characters in nodeids might cause Pytest to fail or produce corrupted report files.
Confidence: 80%
Tokens: 288 input + 162 output = 450 total
Scenario: tests/test_time.py::TestFormatDuration::test_boundary_one_minute
Why Needed: To ensure the `format_duration` function correctly formats durations in one minute.
Confidence: 80%
Tokens: 106 input + 84 output = 190 total
Scenario: tests/test_time.py::TestFormatDuration::test_microseconds_format
Why Needed: To ensure that the `format_duration` function correctly formats sub-millisecond durations as microseconds.
Confidence: 80%
Tokens: 121 input + 86 output = 207 total
Scenario: tests/test_time.py::TestFormatDuration::test_milliseconds_format
Why Needed: To ensure that the `format_duration` function correctly formats sub-second durations as milliseconds.
Confidence: 80%
Tokens: 119 input + 78 output = 197 total
Scenario: tests/test_time.py::TestFormatDuration::test_minutes_format
Why Needed: To ensure the `format_duration` function correctly formats durations over a minute, including minutes and seconds.
Confidence: 80%
Tokens: 124 input + 122 output = 246 total
Scenario: tests/test_time.py::TestFormatDuration::test_multiple_minutes
Why Needed: To ensure the `format_duration` function correctly formats multiple minutes into a human-readable string.
Confidence: 80%
Tokens: 112 input + 78 output = 190 total
Scenario: tests/test_time.py::TestFormatDuration::test_one_second
Why Needed: To ensure the `format_duration` function correctly formats a duration of exactly one second as '1.00s'.
Confidence: 80%
Tokens: 101 input + 102 output = 203 total
Scenario: tests/test_time.py::TestFormatDuration::test_seconds_format
Why Needed: To ensure that the `format_duration` function correctly formats seconds under a minute.
Confidence: 80%
Tokens: 110 input + 76 output = 186 total
Scenario: tests/test_time.py::TestFormatDuration::test_small_milliseconds
Why Needed: To ensure the `format_duration` function correctly formats small millisecond durations.
Confidence: 80%
Tokens: 111 input + 74 output = 185 total
Scenario: tests/test_time.py::TestFormatDuration::test_very_small_microseconds
Why Needed: To ensure that the `format_duration` function correctly formats very small durations as microseconds.
Confidence: 80%
Tokens: 116 input + 77 output = 193 total
Scenario: Test ISO Format with UTC
Why Needed: To verify the correct formatting of datetime objects with UTC timezone.
Confidence: 80%
Tokens: 143 input + 75 output = 218 total
Scenario: Tests for ISO Format
Why Needed: To ensure the naive datetime format is correct without timezone.
Confidence: 80%
Tokens: 136 input + 69 output = 205 total
Scenario: Test IsoFormat with microseconds
Why Needed: To test the functionality of formatting datetime objects with microseconds.
Confidence: 80%
Tokens: 133 input + 62 output = 195 total
Scenario: Test that the current time has a valid UTC timezone.
Why Needed: To ensure that the `datetime` object returned by `utc_now()` has a valid timezone.
Confidence: 80%
Tokens: 109 input + 147 output = 256 total
Scenario: tests/test_time.py::TestUtcNow::test_is_current_time
Why Needed: To ensure that the `utc_now` function returns a time within a reasonable tolerance of the current UTC time.
Confidence: 80%
Tokens: 116 input + 94 output = 210 total
Scenario: tests/test_time.py::TestUtcNow::test_returns_datetime
Why Needed: This test ensures that the function `utc_now()` returns a datetime object.
Confidence: 80%
Tokens: 94 input + 82 output = 176 total
Scenario: When TokenRefresher raises an error on command failure, then the test verifies that it correctly returns a TokenRefreshError with appropriate error message.
Why Needed: This test prevents potential regression where the TokenRefresher might not raise an error when encountering a command failure, potentially causing unexpected behavior or errors later in the application.
Confidence: 80%
Tokens: 310 input + 161 output = 471 total
Scenario: Verify TokenRefresher raises error on empty output when no token is available.
Why Needed: This test prevents a potential bug where the TokenRefresher does not raise an error when there is no token to refresh.
Confidence: 80%
Tokens: 297 input + 144 output = 441 total
Scenario: Test that forcing a refresh bypasses the cache and returns a new token.
Why Needed: This test prevents a regression where the TokenRefresher does not return a new token when forced to refresh.
Confidence: 80%
Tokens: 346 input + 230 output = 576 total
Scenario: Test that the `TokenRefresher` uses a custom JSON key for token refresh.
Why Needed: This test prevents a potential issue where the default JSON key is used instead of a custom one.
Confidence: 80%
Tokens: 303 input + 115 output = 418 total
Scenario: The test verifies that the `TokenRefresher` extracts a JSON object containing the extracted token from the expected JSON output.
Why Needed: This test prevents a potential bug where the `TokenRefresher` does not extract the token correctly if the output format is set to 'json'.
Confidence: 80%
Tokens: 308 input + 131 output = 439 total
Scenario: The test verifies that the `TokenRefresher` class correctly extracts a token from text output when the `output_format` is set to 'text'.
Why Needed: This test prevents a potential bug where the extracted token is not in the expected format, potentially leading to incorrect usage or unexpected behavior.
Confidence: 80%
Tokens: 298 input + 135 output = 433 total
Scenario: The test verifies that the TokenRefresher raises an error when it encounters invalid JSON.
Why Needed: This test prevents a potential bug where the TokenRefresher incorrectly interprets valid JSON as a token refresh request.
Confidence: 80%
Tokens: 299 input + 234 output = 533 total
Scenario: Test TokenRefresher.invalidate() clears cache and updates the token count correctly.
Why Needed: This test prevents a potential bug where the TokenRefresher does not update the token count after calling invalidate().
Confidence: 80%
Tokens: 340 input + 211 output = 551 total
Scenario: Test that TokenRefresher raises an error when the JSON key is missing.
Why Needed: To prevent a potential bug where the TokenRefresher fails to refresh tokens due to a missing required JSON key.
Confidence: 80%
Tokens: 325 input + 125 output = 450 total
Scenario: Test TokenRefresher thread safety by starting multiple threads concurrently and verifying that they all retrieve the same token.
Why Needed: This test prevents a potential bug where multiple threads accessing the TokenRefresher instance simultaneously could result in inconsistent or incorrect results due to race conditions.
Confidence: 80%
Tokens: 427 input + 134 output = 561 total
Scenario: The test verifies that the TokenRefresher handles command timeouts correctly.
Why Needed: This test prevents a potential bug where the TokenRefresher fails to refresh tokens when they timeout.
Confidence: 80%
Tokens: 279 input + 111 output = 390 total
Scenario: Test that TokenRefresher caches tokens and doesn't call the command again.
Why Needed: Prevents a potential bug where multiple requests to get a token would result in the same cached token being returned.
Confidence: 80%
Tokens: 353 input + 159 output = 512 total
Scenario: Test the TokenRefresher edge case when command fails with no stderr output.
Why Needed: To prevent a regression where the TokenRefresher does not raise an exception when the command fails without producing any error output.
Confidence: 80%
Tokens: 322 input + 261 output = 583 total
Scenario: Token Refresh Coverage
Why Needed: Test case for handling empty command string.
Confidence: 80%
Tokens: 151 input + 77 output = 228 total
Scenario: Test the test_invalid_command_string function to verify it handles an invalid command string (shlex parse error).
Why Needed: Prevents a potential bug where the TokenRefresher class incorrectly raises a TokenRefreshError when given an invalid command string.
Confidence: 80%
Tokens: 251 input + 213 output = 464 total
Scenario: Test that the test_json_not_dict scenario verifies a bug preventing handling of non-dict JSON output.
Why Needed: This test prevents regression by ensuring the TokenRefresher handles non-dict JSON output correctly.
Confidence: 80%
Tokens: 328 input + 110 output = 438 total
Scenario: Test handling when token value is an empty string.
Why Needed: Prevents TestTokenRefresherEdgeCases::test_json_token_empty_string from failing due to unexpected behavior of the TokenRefresher class.
Confidence: 80%
Tokens: 324 input + 140 output = 464 total
Scenario: Test handling when token value is not a string.
Why Needed: Prevents the TokenRefresher from attempting to refresh a non-string token value, which could lead to unexpected behavior or errors.
Confidence: 80%
Tokens: 326 input + 147 output = 473 total
Scenario: Test verifies that a TokenRefresher handles an OSError when executing a command.
Why Needed: This test prevents the regression of TokenRefreshError not being raised when executing commands that are not found.
Confidence: 80%
Tokens: 280 input + 122 output = 402 total
Scenario: Test handling when text output has only whitespace lines after initial strip, specifically when parsing with only blank lines.
Why Needed: Prevents a potential bug where the TokenRefresher fails to handle text output with only whitespace lines after an initial strip.
Confidence: 80%
Tokens: 376 input + 174 output = 550 total
Scenario: Test the test_whitespace_only_command to ensure it correctly raises a TokenRefreshError for an empty whitespace-only command string.
Why Needed: Prevents a potential bug where the TokenRefresher is not raised with a meaningful error message when given an empty whitespace-only command string.
Confidence: 80%
Tokens: 236 input + 141 output = 377 total
Scenario: Test token usage aggregation with mock stash and terminal reporter.
Why Needed: Prevents regression in token usage reporting for different test cases.
Confidence: 80%
Tokens: 775 input + 107 output = 882 total