Precision Selection Strategies - Complete Guide#

This example demonstrates all 5 intelligent precision selection strategies in M3S v0.6.0+, helping you choose the optimal precision level for any use case.

================================================================================
M3S Intelligent Precision Selection - All 5 Strategies
================================================================================

================================================================================
Strategy 1: Area-Based Selection
================================================================================

Use when: You know the desired cell size in km²
Examples: 'I need cells around 10 km²', 'Show me 100 hectare cells'

Finding precision for various target areas:
--------------------------------------------------------------------------------
Target:   1000.0 km² → Precision:  5 (Actual:  252.903 km², Deviation:  74.7%, Confidence: 0%)
Target:    100.0 km² → Precision:  6 (Actual:   36.129 km², Deviation:  63.9%, Confidence: 0%)
Target:     10.0 km² → Precision:  7 (Actual:    5.161 km², Deviation:  48.4%, Confidence: 0%)
Target:      1.0 km² → Precision:  8 (Actual:    0.737 km², Deviation:  26.3%, Confidence: 12%)
Target:      0.1 km² → Precision:  9 (Actual:    0.105 km², Deviation:   5.0%, Confidence: 83%)

================================================================================
Strategy 2: Count-Based Selection
================================================================================

Use when: You want a specific number of cells in a region
Examples: 'Split this city into ~100 cells', 'I want about 1000 cells here'

Finding precision for Manhattan area with different target counts:
--------------------------------------------------------------------------------
Target:   10 cells → Precision:  6 (Actual: ~   5 cells, Deviation:  50.0%, Confidence: 0%)
Target:   50 cells → Precision:  7 (Actual: ~  40 cells, Deviation:  20.0%, Confidence: 50%)
Target:  100 cells → Precision:  8 (Actual: ~ 286 cells, Deviation: 186.0%, Confidence: 0%)
Target:  500 cells → Precision:  8 (Actual: ~ 286 cells, Deviation:  42.8%, Confidence: 0%)

================================================================================
Strategy 3: Use-Case Based Selection (Curated Presets)
================================================================================

Use when: You have a common spatial analysis use case
Examples: Analyzing neighborhoods, city planning, country-level analysis

H3 precision recommendations for common use cases:
--------------------------------------------------------------------------------
global          → Precision:  0 (Avg area:  4357449.416 km², Confidence: 95%)
continental     → Precision:  2 (Avg area:    86801.780 km², Confidence: 95%)
country         → Precision:  3 (Avg area:    12392.264 km², Confidence: 95%)
region          → Precision:  5 (Avg area:      252.903 km², Confidence: 95%)
city            → Precision:  7 (Avg area:        5.161 km², Confidence: 95%)
neighborhood    → Precision:  9 (Avg area:        0.105 km², Confidence: 95%)
street          → Precision: 11 (Avg area:        0.002 km², Confidence: 95%)
building        → Precision: 13 (Avg area:        0.000 km², Confidence: 95%)

Same use case ('city') across different grid systems:
--------------------------------------------------------------------------------
 system  precision  area_km2  confidence
geohash          5     2.443        0.95
     h3          7     5.161        0.95
     s2         16     0.020        0.95
quadkey         13     7.600        0.95

================================================================================
Strategy 4: Distance-Based Selection
================================================================================

Use when: You care about cell edge length rather than area
Examples: 'Cells with ~100m edges', 'I need 1km grid spacing'

Finding precision for various target edge lengths:
--------------------------------------------------------------------------------
Target:  10000 m → Precision:  5 (Actual: ~9866.4 m, Deviation:   1.3%, Confidence: 96%)
Target:   5000 m → Precision:  6 (Actual: ~3729.1 m, Deviation:  25.4%, Confidence: 15%)
Target:   1000 m → Precision:  8 (Actual: ~ 532.6 m, Deviation:  46.7%, Confidence: 0%)
Target:    500 m → Precision:  8 (Actual: ~ 532.6 m, Deviation:   6.5%, Confidence: 78%)
Target:    100 m → Precision: 10 (Actual: ~  76.0 m, Deviation:  24.0%, Confidence: 20%)
Target:     50 m → Precision: 11 (Actual: ~  27.7 m, Deviation:  44.5%, Confidence: 0%)

================================================================================
Strategy 5: Performance-Based Selection
================================================================================

Use when: You need to balance precision vs computational cost
Examples: Real-time applications, limited compute budget, large regions

Performance-optimized precision for different scenarios:
--------------------------------------------------------------------------------
point_query     (budget:    10ms, region:   1000 km²)
  → Precision:  8, Est. cells:  1356, Est. time:   1.9 ms
intersect       (budget:   100ms, region:    500 km²)
  → Precision:  8, Est. cells:   678, Est. time:  72.8 ms
conversion      (budget:   200ms, region:    100 km²)
  → Precision:  8, Est. cells:   135, Est. time:  77.5 ms

================================================================================
Practical Example: Combining Strategies in Real Workflow
================================================================================

Scenario: Analyzing neighborhoods in San Francisco
--------------------------------------------------------------------------------

1. Use-case based approach:
   Precision: 9, Confidence: 95%
   H3 precision 9 optimized for 'neighborhood' use case (avg cell area: 0.10 km²)

2. Area-based approach (target 0.5 km² cells):
   Precision: 8, Confidence: 0%
   Actual area: 0.737 km²

3. Count-based approach (target 200 cells):
   Precision: 8, Confidence: 0%
   Estimated cells: 271

4. Distance-based approach (target 500m edges):
   Precision: 8, Confidence: 78%
   Actual edge length: 532.6 m

Comparing all recommendations:
--------------------------------------------------------------------------------
      Strategy  Precision Confidence Area (km²)
      Use-case          9        95%      0.105
    Area-based          8         0%      0.737
   Count-based          8         0%      0.737
Distance-based          8        78%     ~0.284

5. Using the recommendation in a query:
--------------------------------------------------------------------------------

Executed query with precision 9
Found 10 cells (limited to 10 for display)
Average cell area: 0.109 km²

Sample cells:
  89283080c83ffff - 0.109 km²
  89283080c93ffff - 0.109 km²
  89283080c97ffff - 0.109 km²
  89283082127ffff - 0.109 km²
  8928308212fffff - 0.109 km²

================================================================================
Summary: Choosing the Right Strategy
================================================================================

1. Use-Case Based (Strategy 3):
   → Best for: Standard spatial analysis tasks
   → Pros: High confidence, battle-tested presets
   → Cons: Less control over exact cell size

2. Area-Based (Strategy 1):
   → Best for: When cell size matters (e.g., land parcels, service areas)
   → Pros: Precise control over cell area
   → Cons: May not account for cell count in region

3. Count-Based (Strategy 2):
   → Best for: When you need specific number of divisions
   → Pros: Predictable cell count for budgeting/planning
   → Cons: Cell sizes may vary across region

4. Distance-Based (Strategy 4):
   → Best for: Grid-like applications, routing, proximity analysis
   → Pros: Intuitive edge length specification
   → Cons: Approximation for non-square cells

5. Performance-Based (Strategy 5):
   → Best for: Real-time apps, constrained compute environments
   → Pros: Balances detail vs speed
   → Cons: May sacrifice precision for performance

General guidance:
- Start with use-case presets (Strategy 3) for common tasks
- Use area/distance (1/4) when you have specific size requirements
- Use count-based (2) for bounded cell count needs
- Use performance-based (5) when speed is critical

import pandas as pd

from m3s import GridBuilder, PrecisionSelector

pd.set_option("display.max_columns", None)
pd.set_option("display.width", 120)

print("=" * 80)
print("M3S Intelligent Precision Selection - All 5 Strategies")
print("=" * 80)
print()

# Initialize selector for H3 grid system
selector = PrecisionSelector("h3")

# ============================================================================
# Strategy 1: Area-Based Selection
# ============================================================================

print("=" * 80)
print("Strategy 1: Area-Based Selection")
print("=" * 80)
print("\nUse when: You know the desired cell size in km²")
print("Examples: 'I need cells around 10 km²', 'Show me 100 hectare cells'")
print()

# Find precision for different target areas
target_areas = [1000.0, 100.0, 10.0, 1.0, 0.1]

print("Finding precision for various target areas:")
print("-" * 80)

for target_area in target_areas:
    rec = selector.for_area(target_area_km2=target_area, tolerance=0.3)

    deviation = (
        abs(rec.actual_area_km2 - target_area) / target_area * 100
        if target_area > 0
        else 0
    )

    print(
        f"Target: {target_area:8.1f} km² → Precision: {rec.precision:2d} "
        f"(Actual: {rec.actual_area_km2:8.3f} km², Deviation: {deviation:5.1f}%, "
        f"Confidence: {rec.confidence:.0%})"
    )

print()

# ============================================================================
# Strategy 2: Count-Based Selection
# ============================================================================

print("=" * 80)
print("Strategy 2: Count-Based Selection")
print("=" * 80)
print("\nUse when: You want a specific number of cells in a region")
print("Examples: 'Split this city into ~100 cells', 'I want about 1000 cells here'")
print()

# Manhattan bounding box
manhattan_bounds = (40.70, -74.05, 40.85, -73.90)

target_counts = [10, 50, 100, 500]

print("Finding precision for Manhattan area with different target counts:")
print("-" * 80)

for target_count in target_counts:
    rec = selector.for_region_count(
        bounds=manhattan_bounds, target_count=target_count, tolerance=0.4
    )

    deviation = (
        abs(rec.actual_cell_count - target_count) / target_count * 100
        if target_count > 0
        else 0
    )

    print(
        f"Target: {target_count:4d} cells → Precision: {rec.precision:2d} "
        f"(Actual: ~{rec.actual_cell_count:4d} cells, Deviation: {deviation:5.1f}%, "
        f"Confidence: {rec.confidence:.0%})"
    )

print()

# ============================================================================
# Strategy 3: Use-Case Based Selection
# ============================================================================

print("=" * 80)
print("Strategy 3: Use-Case Based Selection (Curated Presets)")
print("=" * 80)
print("\nUse when: You have a common spatial analysis use case")
print("Examples: Analyzing neighborhoods, city planning, country-level analysis")
print()

use_cases = [
    "global",
    "continental",
    "country",
    "region",
    "city",
    "neighborhood",
    "street",
    "building",
]

print("H3 precision recommendations for common use cases:")
print("-" * 80)

for use_case in use_cases:
    rec = selector.for_use_case(use_case)
    print(
        f"{use_case:15s} → Precision: {rec.precision:2d} "
        f"(Avg area: {rec.actual_area_km2:12.3f} km², Confidence: {rec.confidence:.0%})"
    )

print()

# Compare across grid systems
print("Same use case ('city') across different grid systems:")
print("-" * 80)

systems = ["geohash", "h3", "s2", "quadkey"]
city_recs = []

for system in systems:
    sel = PrecisionSelector(system)
    rec = sel.for_use_case("city")
    city_recs.append(
        {
            "system": system,
            "precision": rec.precision,
            "area_km2": rec.actual_area_km2,
            "confidence": rec.confidence,
        }
    )

df = pd.DataFrame(city_recs)
print(df.to_string(index=False))
print()

# ============================================================================
# Strategy 4: Distance-Based Selection
# ============================================================================

print("=" * 80)
print("Strategy 4: Distance-Based Selection")
print("=" * 80)
print("\nUse when: You care about cell edge length rather than area")
print("Examples: 'Cells with ~100m edges', 'I need 1km grid spacing'")
print()

target_distances = [10000, 5000, 1000, 500, 100, 50]  # meters

print("Finding precision for various target edge lengths:")
print("-" * 80)

for target_dist in target_distances:
    rec = selector.for_distance(edge_length_m=target_dist, tolerance=0.3)

    deviation = (
        abs(rec.edge_length_m - target_dist) / target_dist * 100
        if target_dist > 0
        else 0
    )

    print(
        f"Target: {target_dist:6d} m → Precision: {rec.precision:2d} "
        f"(Actual: ~{rec.edge_length_m:6.1f} m, Deviation: {deviation:5.1f}%, "
        f"Confidence: {rec.confidence:.0%})"
    )

print()

# ============================================================================
# Strategy 5: Performance-Based Selection
# ============================================================================

print("=" * 80)
print("Strategy 5: Performance-Based Selection")
print("=" * 80)
print("\nUse when: You need to balance precision vs computational cost")
print("Examples: Real-time applications, limited compute budget, large regions")
print()

# Different operation types with time budgets
scenarios = [
    ("point_query", 10.0, 1000.0),  # Fast operation, large region
    ("intersect", 100.0, 500.0),  # Medium operation, medium region
    ("conversion", 200.0, 100.0),  # Expensive operation, small region
]

print("Performance-optimized precision for different scenarios:")
print("-" * 80)

for op_type, time_budget, region_size in scenarios:
    rec = selector.for_performance(
        operation_type=op_type, time_budget_ms=time_budget, region_size_km2=region_size
    )

    print(
        f"{op_type:15s} (budget: {time_budget:5.0f}ms, region: {region_size:6.0f} km²)"
    )
    print(
        f"  → Precision: {rec.precision:2d}, "
        f"Est. cells: {rec.metadata['estimated_cells']:5d}, "
        f"Est. time: {rec.metadata['estimated_time_ms']:5.1f} ms"
    )

print()

# ============================================================================
# Practical Example: Combining Strategies
# ============================================================================

print("=" * 80)
print("Practical Example: Combining Strategies in Real Workflow")
print("=" * 80)
print()

print("Scenario: Analyzing neighborhoods in San Francisco")
print("-" * 80)

# Try multiple strategies and compare
sf_bounds = (37.70, -122.52, 37.82, -122.35)

print("\n1. Use-case based approach:")
rec1 = selector.for_use_case("neighborhood")
print(f"   Precision: {rec1.precision}, Confidence: {rec1.confidence:.0%}")
print(f"   {rec1.explanation}")

print("\n2. Area-based approach (target 0.5 km² cells):")
rec2 = selector.for_area(target_area_km2=0.5)
print(f"   Precision: {rec2.precision}, Confidence: {rec2.confidence:.0%}")
print(f"   Actual area: {rec2.actual_area_km2:.3f} km²")

print("\n3. Count-based approach (target 200 cells):")
rec3 = selector.for_region_count(bounds=sf_bounds, target_count=200)
print(f"   Precision: {rec3.precision}, Confidence: {rec3.confidence:.0%}")
print(f"   Estimated cells: {rec3.actual_cell_count}")

print("\n4. Distance-based approach (target 500m edges):")
rec4 = selector.for_distance(edge_length_m=500)
print(f"   Precision: {rec4.precision}, Confidence: {rec4.confidence:.0%}")
print(f"   Actual edge length: {rec4.edge_length_m:.1f} m")

print("\nComparing all recommendations:")
print("-" * 80)
comparison = pd.DataFrame(
    [
        {
            "Strategy": "Use-case",
            "Precision": rec1.precision,
            "Confidence": f"{rec1.confidence:.0%}",
            "Area (km²)": f"{rec1.actual_area_km2:.3f}",
        },
        {
            "Strategy": "Area-based",
            "Precision": rec2.precision,
            "Confidence": f"{rec2.confidence:.0%}",
            "Area (km²)": f"{rec2.actual_area_km2:.3f}",
        },
        {
            "Strategy": "Count-based",
            "Precision": rec3.precision,
            "Confidence": f"{rec3.confidence:.0%}",
            "Area (km²)": f"{rec3.metadata.get('region_area_km2', 0) / rec3.actual_cell_count:.3f}",
        },
        {
            "Strategy": "Distance-based",
            "Precision": rec4.precision,
            "Confidence": f"{rec4.confidence:.0%}",
            "Area (km²)": "~" + f"{(rec4.edge_length_m/1000)**2:.3f}",
        },
    ]
)
print(comparison.to_string(index=False))

print("\n5. Using the recommendation in a query:")
print("-" * 80)

# Use the highest-confidence recommendation
best_rec = max([rec1, rec2, rec3, rec4], key=lambda r: r.confidence)

result = (
    GridBuilder.for_system("h3")
    .with_auto_precision(best_rec)
    .in_bbox(37.75, -122.45, 37.80, -122.40)  # Small SF area
    .limit(10)
    .execute()
)

print(f"\nExecuted query with precision {best_rec.precision}")
print(f"Found {len(result)} cells (limited to 10 for display)")
print(
    f"Average cell area: {sum(c.area_km2 for c in result.many) / len(result):.3f} km²"
)

# Display cells
print("\nSample cells:")
for cell in result.many[:5]:
    print(f"  {cell.identifier} - {cell.area_km2:.3f} km²")

print()
print("=" * 80)
print("Summary: Choosing the Right Strategy")
print("=" * 80)
print(
    """
1. Use-Case Based (Strategy 3):
   → Best for: Standard spatial analysis tasks
   → Pros: High confidence, battle-tested presets
   → Cons: Less control over exact cell size

2. Area-Based (Strategy 1):
   → Best for: When cell size matters (e.g., land parcels, service areas)
   → Pros: Precise control over cell area
   → Cons: May not account for cell count in region

3. Count-Based (Strategy 2):
   → Best for: When you need specific number of divisions
   → Pros: Predictable cell count for budgeting/planning
   → Cons: Cell sizes may vary across region

4. Distance-Based (Strategy 4):
   → Best for: Grid-like applications, routing, proximity analysis
   → Pros: Intuitive edge length specification
   → Cons: Approximation for non-square cells

5. Performance-Based (Strategy 5):
   → Best for: Real-time apps, constrained compute environments
   → Pros: Balances detail vs speed
   → Cons: May sacrifice precision for performance

General guidance:
- Start with use-case presets (Strategy 3) for common tasks
- Use area/distance (1/4) when you have specific size requirements
- Use count-based (2) for bounded cell count needs
- Use performance-based (5) when speed is critical
"""
)

Total running time of the script: (0 minutes 22.953 seconds)

Gallery generated by Sphinx-Gallery