actuarialpy¶
Experience analysis on a single tidy table. You build one DataFrame — claims/expense,
revenue, exposure, by period — and Experience gives you views (by, rolling,
trend, completion, seasonality, credibility, pooling) without re-pivoting. numpy and
pandas only; no scipy.
Quickstart¶
import pandas as pd
import actuarialpy as ap
df = pd.DataFrame({
"month": pd.period_range("2024-01", periods=6, freq="M").astype(str),
"product": ["PPO"] * 6,
"paid": [120_000, 118_000, 125_000, 130_000, 128_000, 135_000],
"premium": [150_000] * 6,
"member_months": [1000, 1005, 1010, 1008, 1012, 1015],
})
exp = ap.Experience(df, expense="paid", revenue="premium",
exposure="member_months", date="month")
exp.by("product") # grouped view
exp.loss_ratio # paid / premium
ap.pmpm(df["paid"], df["member_months"]) # per-member-per-month
ap.loss_ratio(df["paid"], df["premium"]) # as a free function
Retention primitives¶
The pooling module includes two general retention-stability primitives:
retained_cv(outcomes, retention, n_units=1)— coefficient of variation of the retained aggregate ofn_unitsi.i.d. units each capped atretention.retention_for_target_cv(outcomes, n_units, target_cv, ...)— inverts it: the retention at which retained CV hits a target. Basis for a size-graded pooling schedule.
API reference¶
actuarialpy ¶
ActuarialPy: tools for actuarial experience analysis.
Experience
dataclass
¶
Bind an experience dataset to its actuarial column roles.
Experience is the recommended entry point for repeated experience-analysis
workflows. It stores common column roles once and delegates calculations to
the package's free functions. The object is immutable: methods return
DataFrames or new Experience objects rather than changing stored data in
place.
Bind count (a claim or service count) to unlock the frequency-severity views:
:meth:frequency_severity and :meth:decompose_trend (utilization x unit cost,
optionally x mix). :meth:fit_trend regresses a developed trend on the bound history.
Grain matters. Experience aggregates by summing the bound columns, so it
expects rows at the grain of the exposure unit -- one row per member-month, with
member_months = 1 (or the eligible fraction). If your data is long (one row per
service line, so the same member-month repeats across several rows), summing the
exposure column overcounts it, and every per-exposure figure -- PMPM, frequency, the
loss-ratio denominator -- is wrong by the number of rows per member-month. Experience
does not detect this: it has no member key, so it cannot tell a long frame from a wide
one. For long or multi-table warehouse data, either aggregate to member-month grain
first, or use :meth:bind, which sources exposure from a correctly-grained table (e.g.
eligibility) via :class:~actuarialpy.Count and never sums a repeated column.
Source code in actuarialpy/frame.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 | |
with_roles ¶
with_roles(
*,
data: DataFrame | None = None,
expense: str | list[str] | None = None,
revenue: str | list[str] | None = None,
exposure: str | list[str] | None = None,
date: str | None = None,
profile: str | None = None,
count: str | None = None,
copy: bool | None = None
) -> "Experience"
Return a new Experience object with updated data or roles.
Source code in actuarialpy/frame.py
filter ¶
Return a new Experience object over a filtered dataset.
Use either a boolean mask or a pandas query string.
Source code in actuarialpy/frame.py
deseasonalize ¶
deseasonalize(
factors: Series,
*,
columns: str | list[str] | None = None,
freq: str = "M",
by: str | list[str] | None = None,
date_col: str | None = None
) -> "Experience"
Return a new Experience with the seasonal pattern divided out.
Each selected column is divided by its row's seasonal factor (as produced by
:func:seasonality_factors), in place under the same name, so every
downstream view -- :meth:trend, :meth:rolling, :meth:by, and the rest --
then operates on the deseasonalized series. By default the expense
(loss / claims) columns are adjusted; pass columns to choose others. Only
the numerator is touched: exposure is left alone, so a deseasonalized PMPM is
simply deseasonalized claims over unchanged member months.
factors may be a flat Series (one pattern) or a tidy per-segment table from
:func:seasonality_factors_by; with the latter, pass by naming the grouping
column(s) to join on group plus season. Estimate factors on the broader pool,
not on this object's own (often thin) data. To put the pattern back, apply
:func:apply_seasonality to .data.
Source code in actuarialpy/frame.py
complete ¶
complete(
factors: Series,
*,
valuation_date: Any = None,
columns: str | list[str] | None = None,
development_col: str | None = None,
by: str | list[str] | None = None,
date_col: str | None = None
) -> "Experience"
Return a new Experience with paid amounts developed to ultimate.
Grosses the expense (loss / claims) columns up to estimated ultimate in place
under the same names -- completed = paid / completion_factor -- so downstream
views (:meth:trend, :meth:rolling, :meth:by, ...) then run on the completed
series. Each row's development period is
development_months(date, valuation_date) (the convention
:func:make_completion_triangle uses), or an explicit development_col. The
join is by value, so the frame's index is irrelevant; rows past the triangle's
last development period are taken as fully complete, and only recent, immature
months actually move.
factors may be a flat Series (one pattern, from :func:completion_factors)
or a tidy per-segment table from :func:completion_factors_by; with the latter,
pass by naming the grouping column(s) to join on group plus development
period. Only the numerator is developed -- exposure is left untouched. This
applies to the latest-diagonal shape (one row per incurred month, claims
paid-to-date as of valuation_date); a frame already on an ultimate basis must
not be completed again.
Source code in actuarialpy/frame.py
adjust ¶
adjust(
factors: float | int | Series | DataFrame,
*,
on: str | list[str] | None = None,
columns: str | list[str] | None = None,
by: str | list[str] | None = None,
how: str = "multiply",
factor_col: str = "factor",
audit_col: str | None = None,
default: float | None = None
) -> "Experience"
Return a new Experience with an expense column restated by a factor.
The general counterpart to :meth:complete and :meth:deseasonalize: joins a
factor by the key on (a column already in the frame, optionally within by
segments) and multiplies -- or, with how="divide", divides -- the selected
column(s) in place under the same name, so every downstream view composes on the
restated series. factors is a scalar (one factor for all rows), a Series
indexed by on, or a tidy DataFrame keyed by by + on.
This is the spine of experience-period restatement -- trend, benefit / area /
demographic relativities, network discounts -- where the methodology is supplied
as the factors rather than encoded here. Chain freely
(exp.complete(...).adjust(trend).adjust(area, on="region")); with audit_col
the cumulative restatement multiplier is carried across the chain, one value per
row, for a reviewable audit trail. An absent key surfaces as NaN unless
default is given (default=1.0 to mean "no adjustment for this key").
Source code in actuarialpy/frame.py
by ¶
Summarize experience by optional grouping columns.
Source code in actuarialpy/frame.py
views ¶
Create several named grouped experience views.
Source code in actuarialpy/frame.py
rolling ¶
rolling(
window: int = 12,
*,
groupby: str | list[str] | None = None,
date_col: str | None = None,
**kwargs: Any
) -> pd.DataFrame
Create a rolling-period experience summary.
Source code in actuarialpy/frame.py
trend ¶
trend(
*,
amount_col: str | None = None,
exposure_col: str | None = None,
groupby: str | list[str] | None = None,
date_col: str | None = None,
**kwargs: Any
) -> pd.DataFrame
Compare amount or per-exposure experience between two periods.
Source code in actuarialpy/frame.py
frequency_severity ¶
frequency_severity(
*,
count_col: str | None = None,
loss_col: str | None = None,
exposure_col: str | None = None,
groupby: str | list[str] | None = None,
annualization: float = 12
) -> pd.DataFrame
Per-group claim frequency, severity, and PMPM (see frequency_severity_summary).
Uses the bound count, expense (as the loss), and exposure roles, so the
columns are specified once on the object. The identity pmpm == frequency *
severity holds for every row.
Source code in actuarialpy/frame.py
decompose_trend ¶
decompose_trend(
*,
count_col: str | None = None,
loss_col: str | None = None,
exposure_col: str | None = None,
mix_by: str | Iterable[str] | None = None,
groupby: str | list[str] | None = None,
period_col: str | None = None,
prior_period: Any = None,
current_period: Any = None,
date_col: str | None = None,
prior_start: Any = None,
prior_end: Any = None,
current_start: Any = None,
current_end: Any = None,
prior_filter: Any = None,
current_filter: Any = None,
annualization: float = 12
) -> pd.DataFrame
Decompose the PMPM trend between two periods of the bound data.
Splits the bound frame into prior and current with the same comparison modes as
:meth:trend -- period_col with prior_period / current_period, a
date_col with prior/current ranges (the bound date is used when no
date_col is passed), or explicit prior_filter / current_filter masks --
then decomposes the change via :func:decompose_pmpm_trend, using the bound
count, expense (as the loss), and exposure roles. Pass mix_by to add
the third LMDI mix term; groupby reports one decomposition per group.
Source code in actuarialpy/frame.py
fit_trend ¶
fit_trend(
*,
value_col: str | None = None,
exposure_col: str | None = None,
date_col: str | None = None,
freq: str = "M",
min_periods: int = 3,
confidence: float = 0.95
) -> TrendFit
Fit an exponential trend to the bound experience by log-linear regression.
Defaults to the bound expense (claims) over the bound exposure -- the PMPM
trend -- across the bound date; pass value_col / exposure_col to
override, or leave the exposure unbound to trend the raw amount. Returns a
TrendFit (see :func:fit_trend). Run on completed, deseasonalized history.
Source code in actuarialpy/frame.py
components ¶
components(
component_cols: str | list[str],
*,
exposure_col: str | None = None,
groupby: str | list[str] | None = None,
date_col: str | None = None,
**kwargs: Any
) -> pd.DataFrame
Explain component drivers between two periods.
Source code in actuarialpy/frame.py
component_summary ¶
component_summary(
component_cols: str | list[str],
*,
groupby: str | list[str] | None = None,
exposure_col: str | None = None,
**kwargs: Any
) -> pd.DataFrame
Summarize component amounts, per-exposure values, and shares.
Source code in actuarialpy/frame.py
actual_vs_expected ¶
actual_vs_expected(
expected: str | list[str],
*,
actual: str | list[str] | None = None,
groupby: str | list[str] | None = None,
exposure: str | list[str] | None = None,
**kwargs: Any
) -> pd.DataFrame
Summarize actual-versus-expected experience.
If actual is omitted, the object's bound expense columns are used.
Source code in actuarialpy/frame.py
claimants ¶
claimants(
claimant_col: str,
*,
amount_cols: str | list[str] | None = None,
groupby: str | list[str] | None = None,
exposure_col: str | None = None,
**kwargs: Any
) -> pd.DataFrame
Aggregate the experience to claimant/member/risk level.
Source code in actuarialpy/frame.py
top_claimants ¶
top_claimants(
claimant_col: str,
*,
amount_cols: str | list[str] | None = None,
amount_col: str | None = None,
groupby: str | list[str] | None = None,
n: int = 25,
**kwargs: Any
) -> pd.DataFrame
Return top claimants by amount.
Source code in actuarialpy/frame.py
claimant_concentration ¶
claimant_concentration(
claimant_col: str,
*,
amount_cols: str | list[str] | None = None,
groupby: str | list[str] | None = None,
**kwargs: Any
) -> pd.DataFrame
Summarize how concentrated experience is among top claimants.
Source code in actuarialpy/frame.py
cohort ¶
cohort(
*,
entity_col: str,
start_date_col: str,
duration_months: int = 12,
groupby: str | list[str] | None = None,
date_col: str | None = None,
**kwargs: Any
) -> pd.DataFrame
Summarize each entity's first N months or cohort-duration window.
Source code in actuarialpy/frame.py
duration ¶
duration(
*,
entity_col: str,
start_date_col: str,
max_duration_month: int | None = None,
date_col: str | None = None,
**kwargs: Any
) -> pd.DataFrame
Summarize experience by duration month since entity start.
Source code in actuarialpy/frame.py
by_status ¶
Summarize experience by a status column.
Source code in actuarialpy/frame.py
with_status ¶
with_status(
*,
effective_col: str,
as_of: Any,
termination_col: str | None = None,
first_year_months: int = 12,
status_col: str = "status",
labels: dict[str, str] | None = None
) -> "Experience"
Return a new Experience with a derived lifecycle status column.
Derives active / first-year / termed from effective and termination dates
as of a reference date (see :func:actuarialpy.derive_status). Summarize
the result with :meth:by_status.
Source code in actuarialpy/frame.py
by_band ¶
Summarize experience by a size band on value_col (see summarize_by_band).
Source code in actuarialpy/frame.py
margin ¶
margin(
groupby: str | list[str] | None = None,
*,
margin_col: str = "margin",
ratio_col: str = "margin_ratio",
per_exposure_col: str | None = None,
**kwargs: Any
) -> pd.DataFrame
Underwriting margin (revenue net of expense) by optional grouping.
Aggregates the bound expense and revenue roles with :meth:by, then adds
the margin (total_revenue - total_expense), the margin ratio, and an
optional per-exposure margin.
Source code in actuarialpy/frame.py
credibility_weighted ¶
credibility_weighted(
groupby: str | list[str],
*,
z: Any,
metric: str = "loss_ratio",
complement: float | None = None,
out_col: str | None = None,
**kwargs: Any
) -> pd.DataFrame
Blend each group's metric with a complement at credibility z.
Computes the grouped summary (:meth:by), then blends metric toward
complement using z (see
:func:actuarialpy.credibility_weighted_estimate). z may be a scalar
or values aligned to the grouped rows. When complement is omitted the
book-level value of metric is used as the complement of credibility.
Source code in actuarialpy/frame.py
pool_claimants ¶
pool_claimants(
claimant_col: str,
pooling_point: float,
*,
amount_cols: str | list[str] | None = None,
groupby: str | list[str] | None = None,
amount_name: str = "total_expense",
**kwargs: Any
) -> pd.DataFrame
Aggregate to claimant level and split each claimant into pooled/excess.
Summarizes the experience to claimant grain (:meth:claimants) and caps
each claimant's total at pooling_point (see
:func:actuarialpy.pool_losses), returning pooled and excess columns for
capped experience and the excess hand-off to tail modeling.
Source code in actuarialpy/frame.py
ChainLadder
dataclass
¶
Chain-ladder development pattern fitted from a cumulative triangle.
Fit with :meth:fit from a cumulative development triangle (for example the
output of :func:make_completion_triangle with cumulative=True):
age_to_age-- link (age-to-age) factors, indexed by their starting development period.cdf-- cumulative development factor to ultimate by development period, including the tail.completion_factors--1 / cdfby development period: the proportion of ultimate emerged by each development period. These are divide-convention factors in(0, 1](completed = paid / factor), so they line up with :func:validate_completion_factorsand downstream completion.
Use :meth:project to apply the pattern to a triangle and get per-origin
ultimate and IBNR.
Source code in actuarialpy/reserving.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 | |
fit
classmethod
¶
Estimate the development pattern from a cumulative triangle.
method is "volume" (volume-weighted age-to-age factors, the
default) or "simple" (straight average of individual link ratios).
tail (>= 1) extends development beyond the latest observed development period.
Source code in actuarialpy/reserving.py
project ¶
Project ultimate and IBNR per origin by applying the fitted pattern.
For each origin, takes its latest observed cumulative amount and multiplies by the cumulative development factor at that development period. Returns one row per origin with the latest development period, latest cumulative, development factor applied, ultimate, and IBNR (ultimate minus latest).
Source code in actuarialpy/reserving.py
InsufficientDataWarning ¶
Bases: UserWarning
Emitted when a segment has too little data to fit and is skipped or aggregated.
Filter it with the standard :mod:warnings machinery, e.g.
warnings.filterwarnings("ignore", category=InsufficientDataWarning).
Source code in actuarialpy/reserving.py
Buhlmann ¶
Bühlmann credibility model.
This implementation assumes each risk has the same number of observations.
Parameters¶
overall_mean : float Estimated collective mean. epv : float Estimated expected process variance (EPV). vhm : float Estimated variance of hypothetical means (VHM). n_obs : int Number of observations per risk.
Source code in actuarialpy/credibility.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
premium ¶
Compute the Bühlmann credibility premium Z * risk_mean + (1 - Z) * overall_mean.
Parameters¶
risk_mean : float or array-like Risk-specific sample mean(s).
Returns¶
float or numpy.ndarray Credibility-weighted premium(s).
Source code in actuarialpy/credibility.py
fit
classmethod
¶
Fit a Bühlmann credibility model from data.
Parameters¶
data : array-like, shape (m, n) Observations for m risks, each with n observations.
Returns¶
Buhlmann Fitted Bühlmann model.
Notes¶
Estimators used:
- overall_mean = mean of all observations
- EPV = average of within-risk sample variances
- VHM = sample variance of risk means minus EPV / n, floored at 0
Source code in actuarialpy/credibility.py
BuhlmannStraub ¶
Bühlmann-Straub credibility model.
This implementation allows different exposure weights by risk and period.
Parameters¶
overall_mean : float Estimated collective mean. epv : float Estimated expected process variance (EPV). vhm : float Estimated variance of hypothetical means (VHM). weights : array-like Total weight (exposure) for each risk.
Source code in actuarialpy/credibility.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 | |
z ¶
Credibility factor for a given total risk weight: Z_i = w_i / (w_i + K).
Parameters¶
weight : float or array-like Total exposure weight(s).
Returns¶
float or numpy.ndarray Credibility factor(s).
Source code in actuarialpy/credibility.py
premium ¶
Compute the Bühlmann-Straub premium Z_i * risk_mean_i + (1 - Z_i) * overall_mean.
Parameters¶
risk_mean : float or array-like Risk-specific weighted mean(s). weight : float or array-like Total exposure weight(s).
Returns¶
float or numpy.ndarray Credibility-weighted premium(s).
Source code in actuarialpy/credibility.py
fit
classmethod
¶
Fit a Bühlmann-Straub model from observations and weights.
Parameters¶
data : array-like, shape (m, n) Observed values X_ij for m risks and n periods. weights : array-like, shape (m, n) Exposure weights w_ij for m risks and n periods.
Returns¶
BuhlmannStraub Fitted Bühlmann-Straub model.
Notes¶
Let w_i. = sum_j w_ij, Xbar_i = sum_j w_ij X_ij / w_i., and
overall_mean = sum_i sum_j w_ij X_ij / sum_i sum_j w_ij.
EPV is estimated by [sum_i sum_j w_ij (X_ij - Xbar_i)^2] / [m (n - 1)].
VHM is the weighted sample variance of the risk means around the overall mean, adjusted by EPV and floored at 0. This is a practical implementation intended for equal period counts.
Source code in actuarialpy/credibility.py
TrendFit
dataclass
¶
Result of :func:fit_trend: an exponential trend fitted to a rate series.
annual_trend is the fitted multiplicative annual trend (exp(slope) - 1 on the
log scale). r_squared is the goodness of fit, std_error the delta-method
standard error of annual_trend, and (ci_low, ci_high) its confidence interval
(asymmetric -- the endpoints are transformed from the log-scale slope interval).
slope and intercept describe the underlying log(value) = intercept + slope * t
fit with t measured in years from the first period.
Source code in actuarialpy/trend.py
factor ¶
Trend factor over months at the fitted rate: (1 + annual_trend) ** (months / 12).
actual_to_expected ¶
combined_ratio ¶
Calculate combined ratio: (losses + expenses) divided by revenue.
expense_ratio ¶
frequency ¶
indicated_change ¶
loss_ratio ¶
medical_loss_ratio ¶
pepm ¶
per_exposure ¶
permissible_loss_ratio ¶
Permissible (target / break-even) loss ratio.
PLR = 1 - expense_ratio - profit_provision where both loadings are
expressed as a fraction of premium. Also called the zero-margin or target
loss ratio: the loss ratio at which premium exactly covers losses, expenses,
and the profit/contingency provision. Works element-wise on scalars or
Series. (Shops that load fixed expenses on a loss basis instead use
(1 - V - Q) / (1 + G); this implements the premium-basis form.)
Source code in actuarialpy/metrics.py
pmpm ¶
pspm ¶
pure_premium ¶
ratio ¶
required_revenue ¶
safe_divide ¶
Safely divide numerator by denominator.
Scalars return scalars. Array-like inputs return NumPy arrays. Zero denominators
are returned as fill_value.
Source code in actuarialpy/metrics.py
severity ¶
utilization_per_1000 ¶
Annualized utilization per 1,000 members.
Returns claim_count / exposure * annualization * 1000. With monthly member
months as exposure the default annualization=12 yields services (admits,
visits, scripts, ...) per 1,000 members per year. If exposure is already in
member-years, pass annualization=1.
Source code in actuarialpy/metrics.py
apply_completion ¶
apply_completion(
df: DataFrame,
factors: Series | DataFrame,
*,
value_col: str,
date_col: str | None = None,
valuation_date: Any = None,
development_col: str | None = None,
by: str | list[str] | None = None,
factor_col: str = "completion_factor",
development_name: str = "development_month",
out_col: str | None = None,
copy: bool = True
) -> pd.DataFrame
Develop a paid amount to estimated ultimate with completion factors.
For each row the development period is taken from development_col if supplied,
otherwise computed as development_months(df[date_col], valuation_date) -- the
convention :func:make_completion_triangle uses, so factors from
:func:completion_factors or :func:completion_factors_by join by construction.
The completed amount is paid / factor (the divide convention, factors in
(0, 1]).
factors may be either of:
- a flat Series indexed by development period (one pattern for the whole frame), or
- a tidy DataFrame of per-segment factors -- grouping column(s), a development-period
column (
development_name) and a factor column (factor_col), the shape :func:completion_factors_byreturns -- joined onbyplus development period. The table must be unique onby + [development](a duplicate would fan out the data); this is checked.
The join is by value, never index alignment, so the frame's own index is irrelevant.
A row past its (group's) largest development period is taken as fully complete
(factor 1.0); a development period inside the fitted range but absent stays
NaN -- a surfaced gap; a row whose group is absent from the factor table stays
NaN; a negative development period (incurred after valuation_date) raises.
Supply either development_col, or both date_col and valuation_date.
Source code in actuarialpy/reserving.py
chain_ladder_by ¶
chain_ladder_by(
df: DataFrame,
*,
groupby: str | list[str],
origin_col: str,
valuation_col: str,
amount_col: str,
cumulative: bool = True,
method: str = "volume",
tail: float = 1.0,
on_insufficient: str = "raise",
warn: bool = True
) -> dict[Any, ChainLadder]
Fit a chain-ladder development pattern per segment of df.
Groups df by groupby, builds a development triangle for each segment
(see :func:make_completion_triangle), and fits a :class:ChainLadder to
each. Returns {segment_key: ChainLadder} -- the key is a scalar for a
single grouping column, or a tuple for several.
Segments too small to fit (fewer than two origins or development periods, a zero cumulative,
and so on) are handled by on_insufficient:
"raise"(default): raise aValueErrornaming the failing segment."skip": omit those segments from the result."aggregate": use the pooled pattern fit on the whole frame for them.
When on_insufficient is "skip" or "aggregate" and warn is true,
an :class:InsufficientDataWarning naming the affected segments is emitted;
warn=False suppresses it (the standard :mod:warnings filters also apply).
To ignore thin segments entirely, use on_insufficient="skip", warn=False.
Source code in actuarialpy/reserving.py
475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 | |
completion_factors ¶
completion_factors(
triangle: DataFrame,
*,
method: str = "volume",
tail: float = 1.0
) -> pd.Series
Completion factors by development period, via chain-ladder.
Convenience wrapper around :class:ChainLadder: returns the proportion of
ultimate emerged by each development period (1 / cdf) estimated from a cumulative
triangle. Divide-convention factors in (0, 1] (completed = paid /
factor). See :class:ChainLadder for the full pattern and per-origin
ultimate/IBNR.
Source code in actuarialpy/reserving.py
completion_factors_by ¶
completion_factors_by(
df: DataFrame,
*,
groupby: str | list[str],
origin_col: str,
valuation_col: str,
amount_col: str,
cumulative: bool = True,
method: str = "volume",
tail: float = 1.0,
on_insufficient: str = "raise",
warn: bool = True,
development_name: str = "development_month"
) -> pd.DataFrame
Completion factors per segment as a tidy table.
Convenience over :func:chain_ladder_by: one row per (segment, development period) with the
completion factor, ready to review, pivot, or join. Columns are the grouping
column(s), development_name, and completion_factor. on_insufficient and
warn behave as in :func:chain_ladder_by.
Source code in actuarialpy/reserving.py
develop_ultimate ¶
develop_ultimate(
df: DataFrame,
factors: Series | DataFrame,
*,
method: str = "bornhuetter_ferguson",
value_col: str,
date_col: str | None = None,
valuation_date: Any = None,
development_col: str | None = None,
apriori_col: str | None = None,
exposure_col: str | None = None,
by: str | list[str] | None = None,
factor_col: str = "completion_factor",
development_name: str = "development_month",
out_col: str | None = None,
copy: bool = True
) -> pd.DataFrame
Develop a paid amount to estimated ultimate by a chosen reserving method.
All methods share one input -- the proportion emerged at each row's development
period, joined exactly as :func:apply_completion does (flat Series or per-segment
table, beyond-the-triangle rows fully emerged). They differ only in how they combine
that with the paid-to-date and an a priori expectation:
"chain_ladder"--paid / emerged. Ignores the a priori; equivalent to :func:apply_completion. Volatile for immature periods (a thin latest diagonal drives the whole tail)."bornhuetter_ferguson"--paid + apriori * (1 - emerged). Takes the unemerged portion from the a priori rather than from the data, so it is stable for green periods. Requiresapriori_col(an expected ultimate per row -- an input, e.g. a plan, budget, or manual times exposure)."benktander"-- one Bornhuetter-Ferguson iteration using the BF ultimate as the a priori:paid + bf * (1 - emerged). A credibility blend sitting between BF and chain ladder (weightemergedon chain ladder). Requiresapriori_col."cape_cod"-- Bornhuetter-Ferguson with the a priori derived from the data: a single expected loss ratio per segment,sum(paid) / sum(exposure * emerged), times each row's exposure. Requiresexposure_col(an on-level premium / exposure per row). The loss ratio is mechanical; the exposure base is an input.
The library applies a method; it does not pick the a priori or the exposure base.
Supply either development_col or both date_col and valuation_date; pass
by with a per-segment factor table (and Cape Cod then derives one loss ratio per
segment). Returns df with an out_col (default f"{value_col}_ultimate").
Source code in actuarialpy/reserving.py
379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 | |
development_months ¶
Whole months of development between incurred (origin) and valuation.
Either argument may be a scalar, a Series, or array-like, in any combination (e.g. a column of incurred dates against a single valuation date). The result is a Series when either argument is a Series, otherwise a scalar.
Source code in actuarialpy/reserving.py
ibnr ¶
IBNR as completed minus paid (the completed/paid identity).
Works element-wise on scalars or Series. completed and paid must be on
the same basis; the result is the amount bridging paid-to-date to ultimate.
Source code in actuarialpy/reserving.py
make_completion_triangle ¶
make_completion_triangle(
df: DataFrame,
*,
origin_col: str,
valuation_col: str,
amount_col: str,
cumulative: bool = True,
index_name: str = "origin_period",
development_name: str = "development_month"
) -> pd.DataFrame
Build a development (completion) triangle by origin period and development period.
Each cell aggregates amount_col for an origin month at a given valuation
development period (whole months between origin and valuation, via :func:development_months).
amount_col is treated as the incremental amount in each (origin, development period)
cell; with cumulative=True -- the default, and the usual basis for
estimating development/completion factors -- the cells are accumulated across
development period. Set cumulative=False to return the incremental triangle, or if your
input amounts are already cumulative-to-date snapshots.
This consumes a compact development aggregate (one row per origin x valuation, i.e. months x months); it does not require transaction/line-level data.
Source code in actuarialpy/reserving.py
validate_completion_factors ¶
validate_completion_factors(
factors: DataFrame,
factor_col: str = "completion_factor",
*,
method: str = "divide"
) -> None
Validate completion-factor values for a selected convention.
divide factors (completed = paid / factor) should satisfy
0 < factor <= 1; multiply factors (completed = paid * factor) should
satisfy factor >= 1. Useful as a sanity check on estimated factors before
they are applied upstream.
Source code in actuarialpy/reserving.py
credibility_weighted_estimate ¶
Blend an observed estimate with its complement at credibility z.
Returns z * observed + (1 - z) * complement. Scalar inputs return a
native float; pandas.Series inputs return a Series with the index
preserved; other array-like inputs return a numpy.ndarray. This is the
atomic credibility operation; the z may come from a model below, a filed
credibility formula, or any other source.
Source code in actuarialpy/credibility.py
full_credibility_claims ¶
full_credibility_claims(
*,
confidence: float = 0.9,
tolerance: float = 0.05,
severity_cv: float | None = None
) -> float
Classical full-credibility standard, in expected number of claims.
Returns the expected claim count for full credibility under the
limited-fluctuation model: (z / k) ** 2 for claim frequency, where z is
the standard-normal quantile for two-sided confidence and k is the
tolerance. The classic 90% / 5% choice gives about 1082 claims. Supplying
severity_cv (the coefficient of variation of individual claim severity)
inflates it to (z / k) ** 2 * (1 + severity_cv ** 2) for aggregate losses
rather than pure frequency.
Many shops use a filed standard instead; pass that straight to
:func:limited_fluctuation_z.
Source code in actuarialpy/credibility.py
limited_fluctuation_z ¶
Limited-fluctuation (classical) credibility factor -- the square-root rule.
Returns Z = min(1, sqrt(exposure / full_credibility_standard)). exposure
is the volume credibility is based on (claim counts, member months, life-years,
...) and full_credibility_standard is the amount of that volume required for
full (Z = 1) credibility -- often a filed value. Scalars return a native
float; pandas.Series inputs return a Series (index preserved); other
array-likes return a numpy.ndarray, so credibility can be computed per group.
Feed the result to :func:credibility_weighted_estimate to blend experience with
its complement.
Source code in actuarialpy/credibility.py
add_months_in_force ¶
add_months_in_force(
df: DataFrame,
*,
effective_col: str,
period_start,
period_end,
termination_col: str | None = None,
out_col: str = "months_in_force",
copy: bool = True
) -> pd.DataFrame
Add whole months of overlap between each entity's in-force window and a period.
The in-force window is [effective, termination] (a missing termination
means the period end). The result is clipped to [period_start, period_end]
and floored at 0. Month counting is inclusive of both endpoint months, so a
full coverage of an N-month period returns N.
Source code in actuarialpy/lifecycle.py
add_tenure ¶
add_tenure(
df: DataFrame,
effective_col: str,
as_of,
*,
tenure_col: str = "tenure_months",
one_based: bool = False,
copy: bool = True
) -> pd.DataFrame
Add tenure in whole months from each entity's effective date to as_of.
as_of is a single reference date (e.g. the experience as-of date). With
one_based=True an entity effective in the as-of month has tenure 1 rather
than 0, matching "months of experience" conventions.
Source code in actuarialpy/lifecycle.py
derive_status ¶
derive_status(
df: DataFrame,
*,
effective_col: str,
as_of,
termination_col: str | None = None,
first_year_months: int = 12,
status_col: str = "status",
labels: dict[str, str] | None = None,
copy: bool = True
) -> pd.DataFrame
Derive an active / first-year / termed status as of a reference date.
Classification (in precedence order):
- termed: a termination date is present and on/before
as_of. - first_year: not termed and tenure (
as_ofminus effective) is less thanfirst_year_months. The window is a parameter because "first year" means the first 12 months in some shops and the first policy year in others. - active: in force beyond the first-year window.
labels optionally remaps the three canonical values, e.g.
{"first_year": "First Year Account", "termed": "Term"}.
Source code in actuarialpy/lifecycle.py
earned_exposure ¶
earned_exposure(
df: DataFrame,
exposure_col: str,
*,
effective_col: str,
period_start,
period_end,
termination_col: str | None = None,
period_months: int | None = None,
out_col: str | None = None,
copy: bool = True
) -> pd.DataFrame
Prorate a full-period exposure by the fraction of the period in force.
earned = exposure * months_in_force / period_months. Use this when each
row carries a full-period exposure (e.g. annualized) that must be reduced for
mid-period entry or termination. If your data is already monthly, filtering
to in-force months with :func:is_in_force is usually simpler.
Source code in actuarialpy/lifecycle.py
is_in_force ¶
is_in_force(
df: DataFrame,
*,
effective_col: str,
period_start,
period_end,
termination_col: str | None = None
) -> pd.Series
Boolean Series: in force at any point during [period_start, period_end].
In force when effective on/before period_end and the entity had not
terminated before period_start (a missing termination date means still
in force).
Source code in actuarialpy/lifecycle.py
assign_band ¶
assign_band(
df: DataFrame,
value_col: str,
bands: Sequence[float],
*,
labels: Sequence[str] | None = None,
band_col: str = "band",
right: bool = False,
copy: bool = True
) -> pd.DataFrame
Assign each row to an ordered size band based on value_col.
bands are bin edges. For integer counts the natural form is left-closed
(right=False), so bands=[0, 51, 76, 151, 251, 501, inf] yields
[0, 51), [51, 76), .... A trailing float("inf") captures the open
top band. The resulting column is an ordered categorical so downstream
group-bys keep band order.
Source code in actuarialpy/banding.py
summarize_by_band ¶
summarize_by_band(
df: DataFrame,
value_col: str,
bands: Sequence[float],
*,
labels: Sequence[str] | None = None,
expense_cols: str | Iterable[str],
revenue_cols: str | Iterable[str],
exposure_cols: str | Iterable[str] | None = None,
band_col: str = "band",
ratio_col: str | None = None,
right: bool = False,
profile: str | None = None
) -> pd.DataFrame
Assign size bands then summarize experience grouped by band.
Returns one row per band in band order (empty bands included), with the same
aggregates, loss ratio, and per-exposure metrics as
:func:~actuarialpy.experience.summarize_experience.
Source code in actuarialpy/banding.py
adjust ¶
adjust(
df: DataFrame,
factors: float | int | Series | DataFrame,
*,
value_col: str,
on: str | list[str] | None = None,
by: str | list[str] | None = None,
how: str = "multiply",
factor_col: str = "factor",
out_col: str | None = None,
audit_col: str | None = None,
default: float | None = None,
copy: bool = True
) -> pd.DataFrame
Multiply or divide a column by a factor joined on a key.
The general factor-application primitive behind trend, benefit / area / demographic relativities, network discounts -- any per-key multiplier. The factor for each row is taken from one of:
- a scalar
factors-- one factor for every row (e.g. a single trend factor); - a Series indexed by
on-- one key column (e.g. an area factor by region); - a tidy DataFrame keyed by
by + onwithfactor_col-- per-segment factors (the shape the*_byestimators return).
and applied to value_col: how="multiply" gives value * factor (loads,
trend), how="divide" gives value / factor (backing a factor out).
The join is by value (the frame's index never participates); the factor table must be
unique on its keys -- a duplicate would fan out the data -- which is enforced. An
absent key gives default (NaN when default is None -- a surfaced gap,
never silently filled); pass default=1.0 when a key missing from the table should
mean "no adjustment". With audit_col, the cumulative net multiplier applied to
value_col is accumulated there (factor for multiply, 1 / factor for
divide), so a chain of adjustments leaves a per-row record of total restatement.
Source code in actuarialpy/adjustments.py
factor_lookup ¶
factor_lookup(
df: DataFrame,
factors: DataFrame,
keys: str | Iterable[str],
*,
factor_col: str,
default: float | None = None
) -> np.ndarray
Join a factor onto df by value on one or more existing key columns.
The single factor-join primitive behind grouped completion, seasonality, and
:func:adjust. factors is a tidy table containing keys and factor_col;
each row of df is matched on its keys values. The factor table must be unique
on keys -- a duplicate would fan rows out on the join -- so this raises otherwise.
Returns a float array aligned to df's row order (the frame's own index never
participates). An absent key gives default (NaN when default is None
-- a surfaced gap, never silently filled).
Source code in actuarialpy/columns.py
add_margin ¶
add_margin(
df: DataFrame,
*,
premium_col: str,
expense_cols: str | Iterable[str],
out_col: str = "margin",
ratio_col: str | None = None,
exposure_col: str | None = None,
per_exposure_col: str | None = None,
copy: bool = True
) -> pd.DataFrame
Add an underwriting-margin column (premium minus summed expense columns).
expense_cols is summed row-wise and may mix losses and loadings (e.g.
medical/claims, retention, commission, allocated overhead). Optionally also
add the margin ratio (ratio_col) and a per-exposure margin
(per_exposure_col, requires exposure_col) such as margin PMPM.
Source code in actuarialpy/margins.py
margin ¶
Margin = premium - expenses, element-wise.
expenses should already be the total of losses plus any loadings.
margin_ratio ¶
excess_over_threshold ¶
excess_over_threshold(
df: DataFrame,
loss_col: str,
threshold: float,
*,
keep_cols: str | Iterable[str] | None = None,
excess_col: str = "excess"
) -> pd.DataFrame
Return losses strictly above threshold with their excess amount.
excess = loss - threshold for rows where loss > threshold. This is
the excess-over-threshold sample used to fit a tail (e.g. a generalized
Pareto distribution in extremeloss) or a severity distribution in
lossmodels; the threshold is the EVT exceedance threshold / pooling
point. keep_cols carries identifier or covariate columns through.
Source code in actuarialpy/pooling.py
pool_losses ¶
pool_losses(
df: DataFrame,
loss_col: str,
pooling_point: float,
*,
pooled_col: str = "pooled_loss",
excess_col: str = "excess_loss",
copy: bool = True
) -> pd.DataFrame
Split each loss into a pooled (capped) portion and an excess portion.
pooled = min(loss, pooling_point) is the retained amount used in the
group's experience; excess = max(loss - pooling_point, 0) is the portion
pooled across the block. Summing pooled_col by group gives capped
experience; summing excess_col gives the pooled excess. The input is
typically one row per claimant (e.g. the output of summarize_claimants).
Source code in actuarialpy/pooling.py
retained_cv ¶
Coefficient of variation of the retained aggregate of n_units iid units.
Each unit's outcome is retained (capped) at retention -- min(outcome,
retention) -- and n_units such units are summed. For independent units
this CV is cv(min(X, retention)) / sqrt(n_units), where X is drawn from
the per-unit outcome sample outcomes (array-like). Capping discards
everything above retention, so only the body of outcomes matters.
Parameters¶
outcomes : array-like Per-unit outcome sample (e.g. one value per member-year, claim, or risk). retention : float or array-like Cap applied to each unit. Scalar returns a float; an array returns the CV at each retention. n_units : int, default 1 Number of independent units in the aggregate.
Returns¶
float or numpy.ndarray Coefficient of variation of the retained aggregate.
Source code in actuarialpy/pooling.py
retention_for_target_cv ¶
Retention at which the retained aggregate of n_units units hits a target CV.
Inverts :func:retained_cv. The single-unit retained CV increases with the
retention, so this solves retained_cv(outcomes, u, n_units=n_units) ==
target_cv for the retention u by interpolation over a grid spanning
bounds (default min..max of outcomes). Targets below or above the
achievable range clamp to the lower or upper bound. Holding target_cv fixed,
a larger n_units yields a higher retention (more independent units stabilize
the aggregate, so less needs to be capped) -- i.e. the basis for a size-graded
retention rule.
Parameters¶
outcomes : array-like
Per-unit outcome sample.
n_units : int
Number of independent units in the aggregate.
target_cv : float
Desired coefficient of variation of the retained aggregate.
bounds : tuple(float, float), optional
(lo, hi) retention search bounds. Defaults to the min and max of
outcomes.
n_grid : int, default 256
Number of grid points spanning bounds.
Returns¶
float
The retention level, clamped to bounds.
Source code in actuarialpy/pooling.py
status_summary ¶
status_summary(
df: DataFrame,
*,
status_col: str,
entity_col: str | None = None,
expense_cols: str | Iterable[str],
revenue_cols: str | Iterable[str],
exposure_cols: str | Iterable[str] | None = None,
profile: str | None = None
) -> pd.DataFrame
Summarize experience by status, optionally adding entity counts.
Source code in actuarialpy/experience.py
summarize_experience ¶
summarize_experience(
df: DataFrame,
*,
groupby: str | Iterable[str] | None = None,
expense_cols: str | Iterable[str],
revenue_cols: str | Iterable[str],
exposure_cols: str | Iterable[str] | None = None,
ratio_col: str | None = None,
ratio_name: str | None = None,
total_expense_name: str = "total_expense",
total_revenue_name: str = "total_revenue",
profile: str | None = None,
labels: dict[str, str] | None = None
) -> pd.DataFrame
Summarize experience by grouping columns.
Amounts and exposures are aggregated first. Ratios and per-exposure metrics are calculated after aggregation, which avoids averaging row-level ratios.
By default the ratio column is named loss_ratio (general across lines of
business); the health profile names it mlr and life
benefit_ratio. profile only supplies light defaults and does not
rename total expense or total revenue.
Source code in actuarialpy/experience.py
summarize_views ¶
summarize_views(
df: DataFrame,
*,
views: dict[str, str | Iterable[str] | None],
expense_cols: str | Iterable[str],
revenue_cols: str | Iterable[str],
exposure_cols: str | Iterable[str] | None = None,
ratio_col: str | None = None,
ratio_name: str | None = None,
total_expense_name: str = "total_expense",
total_revenue_name: str = "total_revenue",
profile: str | None = None
) -> dict[str, pd.DataFrame]
Create multiple experience summary views from the same input data.
Source code in actuarialpy/experience.py
summarize_actual_vs_expected ¶
summarize_actual_vs_expected(
df: DataFrame,
*,
groupby: str | Iterable[str] | None = None,
actual_cols: str | Iterable[str],
expected_cols: str | Iterable[str],
exposure_cols: str | Iterable[str] | None = None,
actual_name: str = "actual",
expected_name: str = "expected",
ae_name: str = "actual_to_expected",
variance_name: str = "variance",
variance_pct_name: str = "variance_pct"
) -> pd.DataFrame
Summarize actual-versus-expected results by optional grouping columns.
Actual and expected amounts are aggregated before ratios are calculated. This makes the function suitable for claim costs, benefits, expenses, revenue, or any other actual-versus-expected measure.
Source code in actuarialpy/expected.py
summarize_claimants ¶
summarize_claimants(
df: DataFrame,
*,
claimant_col: str,
amount_cols: str | Iterable[str],
groupby: str | Iterable[str] | None = None,
exposure_col: str | None = None,
amount_name: str = "total_expense"
) -> pd.DataFrame
Aggregate experience to claimant/member/risk level.
claimant_col can be a member ID, policy ID, claim group ID, or another
entity identifier. The function is descriptive; it does not cap, pool, or
otherwise adjust the underlying amounts.
Source code in actuarialpy/claimants.py
top_claimants ¶
top_claimants(
df: DataFrame,
*,
claimant_col: str,
amount_cols: str | Iterable[str] | None = None,
amount_col: str | None = None,
groupby: str | Iterable[str] | None = None,
n: int = 25,
amount_name: str = "total_expense"
) -> pd.DataFrame
Return the top claimants by amount, optionally within each group.
Source code in actuarialpy/claimants.py
large_claimant_flags ¶
large_claimant_flags(
df: DataFrame,
*,
amount_col: str = "total_expense",
thresholds: Sequence[float] = (50000, 100000, 250000)
) -> pd.DataFrame
Add boolean flags for claimants above one or more amount thresholds.
Source code in actuarialpy/claimants.py
claim_concentration ¶
claim_concentration(
df: DataFrame,
*,
amount_col: str = "total_expense",
groupby: str | Iterable[str] | None = None,
top_n: Sequence[int] = (10, 25),
thresholds: Sequence[float] = (50000, 100000, 250000)
) -> pd.DataFrame
Summarize how concentrated total amounts are among top claimants.
The input should generally be one row per claimant within the requested
grouping level, such as the output of summarize_claimants.
Source code in actuarialpy/claimants.py
rolling_summary ¶
rolling_summary(
df: DataFrame,
*,
date_col: str,
window: int = 12,
groupby: str | Iterable[str] | None = None,
expense_cols: str | Iterable[str],
revenue_cols: str | Iterable[str],
exposure_cols: str | Iterable[str] | None = None,
min_periods: int | None = None,
drop_incomplete: bool = True,
ratio_col: str = "loss_ratio"
) -> pd.DataFrame
Calculate rolling sums and ratios by period and optional grouping.
The output includes period_start and period_end. By default only
complete rolling windows are returned; for a 12-month window, the first
output row appears after 12 months of data are available.
Source code in actuarialpy/rolling.py
annualized_trend ¶
Annualize change between two values separated by a number of months.
Source code in actuarialpy/trend.py
fit_trend ¶
fit_trend(
df: DataFrame,
*,
value_col: str,
date_col: str,
exposure_col: str | None = None,
freq: str = "M",
min_periods: int = 3,
confidence: float = 0.95
) -> TrendFit
Fit an exponential trend to a rate series by log-linear regression.
Aggregates df to the freq grain (summing value_col and, if given,
exposure_col), forms the rate -- value / exposure (e.g. PMPM) when
exposure_col is supplied, otherwise value itself -- and fits
log(rate) = intercept + slope * t by ordinary least squares, with t in years
from the first period. The fitted annual trend is exp(slope) - 1.
Unlike :func:annualized_trend (a two-point CAGR between a single current and prior
value), this uses every period, so one noisy month does not swing the estimate, and it
returns goodness of fit and a confidence interval -- what a developed (rather than
received) trend is judged on. It does not select the trend: the window, the rate basis
(allowed vs paid), any benefit leveraging, and the blend with external trends remain
judgment. Run it on completed, deseasonalized history (complete -> deseasonalize ->
fit_trend) so runout and seasonality do not contaminate the slope; apply the result
with :func:trend_factor/:meth:TrendFit.factor or :func:adjust.
Time is measured from actual period dates, so an occasional missing period is handled
correctly. Requires at least min_periods distinct periods with strictly positive
rates (non-positive values, which cannot be logged, raise). Returns a :class:TrendFit.
Source code in actuarialpy/trend.py
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 | |
midpoint_trend_factor ¶
Trend factor between base and projection midpoints.
Source code in actuarialpy/trend.py
period_change ¶
project_forward ¶
trend_factor ¶
trend_summary ¶
trend_summary(
df: DataFrame,
*,
period_col: str | None = None,
prior_period=None,
current_period=None,
date_col: str | None = None,
prior_start=None,
prior_end=None,
current_start=None,
current_end=None,
groupby=None,
amount_col: str,
exposure_col: str | None = None,
prior_filter=None,
current_filter=None,
prior_label: str = "prior",
current_label: str = "current"
) -> pd.DataFrame
Summarize current vs prior trend by optional grouping.
Supported comparison modes:
- period_col='year', prior_period=2025, current_period=2026
- date_col='incurred_date' with prior/current start and end dates
- explicit boolean prior_filter and current_filter masks
Source code in actuarialpy/trend.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
component_driver_analysis ¶
component_driver_analysis(
df: DataFrame,
*,
period_col: str | None = None,
prior_period=None,
current_period=None,
date_col: str | None = None,
prior_start=None,
prior_end=None,
current_start=None,
current_end=None,
prior_filter=None,
current_filter=None,
component_cols: str | Iterable[str],
exposure_col: str | None = None,
groupby: str | Iterable[str] | None = None
) -> pd.DataFrame
Explain component drivers of change between two periods.
The primary comparison is based on component totals, or component amount per
exposure when exposure_col is supplied. The API matches trend_summary
and supports period-column, date-range, or explicit-filter comparisons.
Source code in actuarialpy/components.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | |
component_trend ¶
Alias for component_driver_analysis.
The preferred name is component_driver_analysis because the function
explains drivers of total component change, not just component-specific trend.
Source code in actuarialpy/components.py
summarize_components ¶
summarize_components(
df: DataFrame,
*,
groupby: str | Iterable[str] | None = None,
component_cols: str | Iterable[str],
exposure_col: str | None = None,
total_col: str = "total_expense",
include_shares: bool = True
) -> pd.DataFrame
Summarize component/category amounts, per-exposure values, and shares.
Source code in actuarialpy/components.py
cohort_summary ¶
cohort_summary(
df: DataFrame,
*,
entity_col: str,
date_col: str,
start_date_col: str,
duration_months: int = 12,
groupby: str | Iterable[str] | None = None,
expense_cols: str | Iterable[str],
revenue_cols: str | Iterable[str],
exposure_cols: str | Iterable[str] | None = None,
profile: str | None = None
) -> pd.DataFrame
Summarize each entity's first N months or cohort-duration window.
Each entity is clipped to its own first duration_months months of duration
(month 1 is the entity's start month), aligning entities by tenure rather than
calendar time. The output also reports how much of that window is actually
present, so partial (not-yet-mature) cohorts can be spotted and excluded:
months_observed: count of distinct duration months present (1..N).last_month: latest experience month observed; withfirst_monththis gives the available range.complete: whether the full window is present, i.e.months_observed == duration_months.
For example, to keep only cohorts with a full first year::
cohorts = exp.cohort(entity_col="group", start_date_col="effective_date")
mature = cohorts[cohorts["complete"]]
Source code in actuarialpy/cohorts.py
cohort_summary_by_period ¶
cohort_summary_by_period(
cohort_df: DataFrame,
*,
cohort_date_col: str = "first_month",
freq: str = "Q",
entity_col: str | None = None,
expense_col: str = "total_expense",
revenue_col: str = "total_revenue",
exposure_cols: str | Iterable[str] | None = None
) -> pd.DataFrame
Roll entity-level cohort summaries into cohort month/quarter/year buckets.
Source code in actuarialpy/cohorts.py
duration_summary ¶
duration_summary(
df: DataFrame,
*,
entity_col: str,
date_col: str,
start_date_col: str,
expense_cols: str | Iterable[str],
revenue_cols: str | Iterable[str],
exposure_cols: str | Iterable[str] | None = None,
max_duration_month: int | None = None
) -> pd.DataFrame
Summarize experience by duration month since entity start.
Source code in actuarialpy/cohorts.py
decompose_pmpm_trend ¶
decompose_pmpm_trend(
prior: DataFrame,
current: DataFrame,
*,
count_col: str,
loss_col: str,
exposure_col: str,
on: str | Iterable[str] | None = None,
mix_by: str | Iterable[str] | None = None,
annualization: float = 12
) -> pd.DataFrame
Decompose the PMPM change from prior to current.
With mix_by omitted this is the two-way split: both frames are summarized with
:func:frequency_severity_summary (optionally by the on keys), aligned, and the
change reported two exact ways:
- Multiplicative trend:
pmpm_trend == util_trend * cost_trend, whereutil_trendis the frequency ratio andcost_trendthe severity ratio. - Additive dollars:
pmpm_change == util_effect + cost_effectvia a symmetric (midpoint) split, so the contributions sum exactly to the PMPM change.
Pass mix_by (a column or list of columns) to add a third mix component. PMPM
is then decomposed into utilization, unit cost, and the effect of the membership
composition shifting across the mix_by cells. Utilization and unit cost are
measured within each cell (free of composition), and mix captures the aggregate
movement that comes purely from the cell weights changing -- the piece the two-way
otherwise misattributes to utilization and unit cost. The split uses the LMDI
(logarithmic mean Divisia index) convention, which is order-free and reconciles
exactly: pmpm_trend == util_trend * cost_trend * mix_trend and
pmpm_change == util_effect + cost_effect + mix_effect.
A list of columns in mix_by defines the cells as their cross -- one blended mix
term, not a per-column attribution; to attribute mix to each dimension separately,
run the decomposition once per dimension. on and mix_by are orthogonal:
on groups the output rows, mix_by defines the mix cells within each group.
Every cell must have positive count, loss, and exposure in both periods.
Source code in actuarialpy/decomposition.py
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 | |
frequency_severity_summary ¶
frequency_severity_summary(
df: DataFrame,
*,
count_col: str,
loss_col: str,
exposure_col: str,
groupby: str | Iterable[str] | None = None,
annualization: float = 12
) -> pd.DataFrame
Per-group claim frequency, severity, and PMPM.
Counts, losses, and exposure are aggregated first, then the rates are derived
after aggregation (avoiding averaging row-level rates). The identity
pmpm == frequency * severity holds for every row. frequency is claims per
exposure unit (per member month for monthly data), severity is loss per claim,
util_per_1000 is annualized claims per 1,000 members, and pmpm is loss per
exposure unit.
Source code in actuarialpy/decomposition.py
add_business_days ¶
add_business_days(
df: DataFrame,
date_col: str,
*,
freq: str = "M",
out_col: str = "business_days",
holidays: Any = "us_federal",
weekmask: str = "Mon Tue Wed Thu Fri",
copy: bool = True
) -> pd.DataFrame
Add a column with the number of business days in each row's period.
Divide a paid-amount column by this to get an amount-per-business-day series that is comparable across short and long months.
Source code in actuarialpy/seasonality.py
apply_seasonality ¶
apply_seasonality(
df: DataFrame,
factors: Series | DataFrame,
*,
date_col: str,
value_col: str,
freq: str = "M",
by: str | list[str] | None = None,
factor_col: str = "seasonal_factor",
season_name: str = "season",
out_col: str | None = None,
copy: bool = True
) -> pd.DataFrame
Multiply value_col by each row's seasonal factor, adding the pattern back.
factors may be flat (Series indexed by season) or a tidy per-segment table joined
on by plus season; see :func:deseasonalize for the grouped-table contract.
Source code in actuarialpy/seasonality.py
business_days_in_period ¶
business_days_in_period(
periods: Any,
*,
freq: str = "M",
holidays: Any = "us_federal",
weekmask: str = "Mon Tue Wed Thu Fri"
) -> pd.Series
Count business days (weekdays minus holidays) in each distinct period.
periods is any set of dates; they are mapped to their period (month or
quarter) and de-duplicated. holidays is "us_federal" (pandas' built-in
US federal calendar), None (weekdays only), or a list of holiday dates.
weekmask controls which weekdays count. Returns a Series indexed by period
start timestamp.
Source code in actuarialpy/seasonality.py
deseasonalize ¶
deseasonalize(
df: DataFrame,
factors: Series | DataFrame,
*,
date_col: str,
value_col: str,
freq: str = "M",
by: str | list[str] | None = None,
factor_col: str = "seasonal_factor",
season_name: str = "season",
out_col: str | None = None,
copy: bool = True
) -> pd.DataFrame
Divide value_col by each row's seasonal factor, removing the pattern.
factors is either a flat Series indexed by season (one pattern for the frame) or
a tidy per-segment DataFrame -- grouping column(s), a season column (season_name)
and a factor column (factor_col), the shape :func:seasonality_factors_by
returns -- joined on by plus season. The grouped join is by value (index
irrelevant), the factor table must be unique on by + [season], and a row whose
(group, season) is absent yields NaN.
Source code in actuarialpy/seasonality.py
seasonality_factors ¶
seasonality_factors(
df: DataFrame,
*,
date_col: str,
value_col: str,
exposure_col: str | None = None,
freq: str = "M",
method: str = "ratio_to_moving_average",
aggregate: str = "mean",
exclude: Iterable[int] | None = None,
min_years: int = 2
) -> pd.Series
Estimate seasonal factors -- one multiplier per calendar period, mean 1.0.
The series is first aggregated to the period grain (summing value_col and, if
given, exposure_col). With exposure_col the factors are computed on the
rate value / exposure (e.g. PMPM), which is the right basis for health
seasonality; without it they are computed on the value directly.
Methods:
"ratio_to_moving_average"(default): classical multiplicative decomposition. Each period is divided by a centered moving average (which removes trend and level), and the seasonal factor for a calendar period is the average of those ratios across years. Robust to trend and membership growth."period_share": each period expressed as a share of its own year's average, then averaged by calendar period. Simpler, but assumes little within-year trend.
aggregate is "mean" or "median" (median is more robust to outlier
months). exclude drops whole years from the estimate -- e.g.
exclude=[2020, 2021] to keep COVID-distorted years out of the factors. A
warning is raised when fewer than min_years years inform any period. Factors
are normalized to average exactly 1.0.
Source code in actuarialpy/seasonality.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
seasonality_factors_by ¶
seasonality_factors_by(
df: DataFrame,
*,
groupby: str | list[str],
date_col: str,
value_col: str,
exposure_col: str | None = None,
freq: str = "M",
method: str = "ratio_to_moving_average",
aggregate: str = "mean",
exclude: Iterable[int] | None = None,
min_years: int = 2,
season_name: str = "season",
warn: bool = True
) -> pd.DataFrame
Seasonal factors per segment as a tidy table.
Fits :func:seasonality_factors within each segment of groupby and stacks the
results into one row per (segment, season) -- columns are the grouping column(s),
season_name, and seasonal_factor -- the shape :func:deseasonalize and
:func:apply_seasonality consume via by=. Seasons absent from a segment's history
are omitted for that segment (they surface as NaN on join). Set warn=False to
silence the thin-history :class:InsufficientDataWarning per segment.