diff options
author | Hugh Dickins <hughd@google.com> | 2022-11-02 18:53:45 -0700 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2022-11-30 15:58:47 -0800 |
commit | 9bd3155ed83b723be719e522760f107229e2a61b (patch) | |
tree | 7d2f26440a6085eeffd6c827c7134758624923fa /mm/rmap.c | |
parent | cb67f4282bf9693658dbda934a441ddbbb1446df (diff) |
mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts
Fix the races in maintaining compound_mapcount, subpages_mapcount and
subpage _mapcount by using PG_locked in the first tail of any compound
page for a bit_spin_lock() on such modifications; skipping the usual
atomic operations on those fields in this case.
Bring page_remove_file_rmap() and page_remove_anon_compound_rmap() back
into page_remove_rmap() itself. Rearrange page_add_anon_rmap() and
page_add_file_rmap() and page_remove_rmap() to follow the same "if
(compound) {lock} else if (PageCompound) {lock} else {atomic}" pattern
(with a PageTransHuge in the compound test, like before, to avoid BUG_ONs
and optimize away that block when THP is not configured). Move all the
stats updates outside, after the bit_spin_locked section, so that it is
sure to be a leaf lock.
Add page_dup_compound_rmap() to manage compound locking versus atomics in
sync with the rest. In particular, hugetlb pages are still using the
atomics: to avoid unnecessary interference there, and because they never
have subpage mappings; but this exception can easily be changed.
Conveniently, page_dup_compound_rmap() turns out to suit an anon THP's
__split_huge_pmd_locked() too.
bit_spin_lock() is not popular with PREEMPT_RT folks: but PREEMPT_RT
sensibly excludes TRANSPARENT_HUGEPAGE already, so its only exposure is to
the non-hugetlb non-THP pte-mapped compound pages (with large folios being
currently dependent on TRANSPARENT_HUGEPAGE). There is never any scan of
subpages in this case; but we have chosen to use PageCompound tests rather
than PageTransCompound tests to gate the use of lock_compound_mapcounts(),
so that page_mapped() is correct on all compound pages, whether or not
TRANSPARENT_HUGEPAGE is enabled: could that be a problem for PREEMPT_RT,
when there is contention on the lock - under heavy concurrent forking for
example? If so, then it can be turned into a sleeping lock (like
folio_lock()) when PREEMPT_RT.
A simple 100 X munmap(mmap(2GB, MAP_SHARED|MAP_POPULATE, tmpfs), 2GB) took
18 seconds on small pages, and used to take 1 second on huge pages, but
now takes 115 milliseconds on huge pages. Mapping by pmds a second time
used to take 860ms and now takes 86ms; mapping by pmds after mapping by
ptes (when the scan is needed) used to take 870ms and now takes 495ms.
Mapping huge pages by ptes is largely unaffected but variable: between 5%
faster and 5% slower in what I've recorded. Contention on the lock is
likely to behave worse than contention on the atomics behaved.
Link: https://lkml.kernel.org/r/1b42bd1a-8223-e827-602f-d466c2db7d3c@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'mm/rmap.c')
-rw-r--r-- | mm/rmap.c | 333 |
1 files changed, 188 insertions, 145 deletions
diff --git a/mm/rmap.c b/mm/rmap.c index f43339ea4970..512e53cae2ca 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1085,11 +1085,66 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, return page_vma_mkclean_one(&pvmw); } +struct compound_mapcounts { + unsigned int compound_mapcount; + unsigned int subpages_mapcount; +}; + +/* + * lock_compound_mapcounts() first locks, then copies subpages_mapcount and + * compound_mapcount from head[1].compound_mapcount and subpages_mapcount, + * converting from struct page's internal representation to logical count + * (that is, adding 1 to compound_mapcount to hide its offset by -1). + */ +static void lock_compound_mapcounts(struct page *head, + struct compound_mapcounts *local) +{ + bit_spin_lock(PG_locked, &head[1].flags); + local->compound_mapcount = atomic_read(compound_mapcount_ptr(head)) + 1; + local->subpages_mapcount = atomic_read(subpages_mapcount_ptr(head)); +} + +/* + * After caller has updated subpage._mapcount, local subpages_mapcount and + * local compound_mapcount, as necessary, unlock_compound_mapcounts() converts + * and copies them back to the compound head[1] fields, and then unlocks. + */ +static void unlock_compound_mapcounts(struct page *head, + struct compound_mapcounts *local) +{ + atomic_set(compound_mapcount_ptr(head), local->compound_mapcount - 1); + atomic_set(subpages_mapcount_ptr(head), local->subpages_mapcount); + bit_spin_unlock(PG_locked, &head[1].flags); +} + +/* + * When acting on a compound page under lock_compound_mapcounts(), avoid the + * unnecessary overhead of an actual atomic operation on its subpage mapcount. + * Return true if this is the first increment or the last decrement + * (remembering that page->_mapcount -1 represents logical mapcount 0). + */ +static bool subpage_mapcount_inc(struct page *page) +{ + int orig_mapcount = atomic_read(&page->_mapcount); + + atomic_set(&page->_mapcount, orig_mapcount + 1); + return orig_mapcount < 0; +} + +static bool subpage_mapcount_dec(struct page *page) +{ + int orig_mapcount = atomic_read(&page->_mapcount); + + atomic_set(&page->_mapcount, orig_mapcount - 1); + return orig_mapcount == 0; +} + /* * When mapping a THP's first pmd, or unmapping its last pmd, if that THP * also has pte mappings, then those must be discounted: in order to maintain * NR_ANON_MAPPED and NR_FILE_MAPPED statistics exactly, without any drift, * and to decide when an anon THP should be put on the deferred split queue. + * This function must be called between lock_ and unlock_compound_mapcounts(). */ static int nr_subpages_unmapped(struct page *head, int nr_subpages) { @@ -1103,6 +1158,40 @@ static int nr_subpages_unmapped(struct page *head, int nr_subpages) return nr; } +/* + * page_dup_compound_rmap(), used when copying mm, or when splitting pmd, + * provides a simple example of using lock_ and unlock_compound_mapcounts(). + */ +void page_dup_compound_rmap(struct page *page, bool compound) +{ + struct compound_mapcounts mapcounts; + struct page *head; + + /* + * Hugetlb pages could use lock_compound_mapcounts(), like THPs do; + * but at present they are still being managed by atomic operations: + * which are likely to be somewhat faster, so don't rush to convert + * them over without evaluating the effect. + * + * Note that hugetlb does not call page_add_file_rmap(): + * here is where hugetlb shared page mapcount is raised. + */ + if (PageHuge(page)) { + atomic_inc(compound_mapcount_ptr(page)); + return; + } + + head = compound_head(page); + lock_compound_mapcounts(head, &mapcounts); + if (compound) { + mapcounts.compound_mapcount++; + } else { + mapcounts.subpages_mapcount++; + subpage_mapcount_inc(page); + } + unlock_compound_mapcounts(head, &mapcounts); +} + /** * page_move_anon_rmap - move a page to our anon_vma * @page: the page to move to our anon_vma @@ -1212,7 +1301,8 @@ static void __page_check_anon_rmap(struct page *page, void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, unsigned long address, rmap_t flags) { - int nr, nr_pages; + struct compound_mapcounts mapcounts; + int nr = 0, nr_pmdmapped = 0; bool compound = flags & RMAP_COMPOUND; bool first; @@ -1222,33 +1312,37 @@ void page_add_anon_rmap(struct page *page, VM_BUG_ON_PAGE(!PageLocked(page), page); if (compound && PageTransHuge(page)) { - atomic_t *mapcount; - VM_BUG_ON_PAGE(!PageLocked(page), page); - mapcount = compound_mapcount_ptr(page); - first = atomic_inc_and_test(mapcount); + lock_compound_mapcounts(page, &mapcounts); + first = !mapcounts.compound_mapcount; + mapcounts.compound_mapcount++; + if (first) { + nr = nr_pmdmapped = thp_nr_pages(page); + if (mapcounts.subpages_mapcount) + nr = nr_subpages_unmapped(page, nr_pmdmapped); + } + unlock_compound_mapcounts(page, &mapcounts); - nr = nr_pages = thp_nr_pages(page); - if (first && head_subpages_mapcount(page)) - nr = nr_subpages_unmapped(page, nr_pages); - } else { - nr = 1; - if (PageTransCompound(page)) { - struct page *head = compound_head(page); + } else if (PageCompound(page)) { + struct page *head = compound_head(page); - atomic_inc(subpages_mapcount_ptr(head)); - nr = !head_compound_mapcount(head); - } + lock_compound_mapcounts(head, &mapcounts); + mapcounts.subpages_mapcount++; + first = subpage_mapcount_inc(page); + nr = first && !mapcounts.compound_mapcount; + unlock_compound_mapcounts(head, &mapcounts); + + } else { first = atomic_inc_and_test(&page->_mapcount); + nr = first; } VM_BUG_ON_PAGE(!first && (flags & RMAP_EXCLUSIVE), page); VM_BUG_ON_PAGE(!first && PageAnonExclusive(page), page); - if (first) { - if (compound) - __mod_lruvec_page_state(page, NR_ANON_THPS, nr_pages); + if (nr_pmdmapped) + __mod_lruvec_page_state(page, NR_ANON_THPS, nr_pmdmapped); + if (nr) __mod_lruvec_page_state(page, NR_ANON_MAPPED, nr); - } if (unlikely(PageKsm(page))) unlock_page_memcg(page); @@ -1308,39 +1402,41 @@ void page_add_new_anon_rmap(struct page *page, void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, bool compound) { - int nr = 0; + struct compound_mapcounts mapcounts; + int nr = 0, nr_pmdmapped = 0; + bool first; VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); lock_page_memcg(page); + if (compound && PageTransHuge(page)) { - int nr_pages; + lock_compound_mapcounts(page, &mapcounts); + first = !mapcounts.compound_mapcount; + mapcounts.compound_mapcount++; + if (first) { + nr = nr_pmdmapped = thp_nr_pages(page); + if (mapcounts.subpages_mapcount) + nr = nr_subpages_unmapped(page, nr_pmdmapped); + } + unlock_compound_mapcounts(page, &mapcounts); - if (!atomic_inc_and_test(compound_mapcount_ptr(page))) - goto out; + } else if (PageCompound(page)) { + struct page *head = compound_head(page); - nr = nr_pages = thp_nr_pages(page); - if (head_subpages_mapcount(page)) - nr = nr_subpages_unmapped(page, nr_pages); + lock_compound_mapcounts(head, &mapcounts); + mapcounts.subpages_mapcount++; + first = subpage_mapcount_inc(page); + nr = first && !mapcounts.compound_mapcount; + unlock_compound_mapcounts(head, &mapcounts); - if (PageSwapBacked(page)) - __mod_lruvec_page_state(page, NR_SHMEM_PMDMAPPED, - nr_pages); - else - __mod_lruvec_page_state(page, NR_FILE_PMDMAPPED, - nr_pages); } else { - bool pmd_mapped = false; - - if (PageTransCompound(page)) { - struct page *head = compound_head(page); - - atomic_inc(subpages_mapcount_ptr(head)); - pmd_mapped = head_compound_mapcount(head); - } - if (atomic_inc_and_test(&page->_mapcount) && !pmd_mapped) - nr++; + first = atomic_inc_and_test(&page->_mapcount); + nr = first; } -out: + + if (nr_pmdmapped) + __mod_lruvec_page_state(page, PageSwapBacked(page) ? + NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped); if (nr) __mod_lruvec_page_state(page, NR_FILE_MAPPED, nr); unlock_page_memcg(page); @@ -1348,137 +1444,84 @@ out: mlock_vma_page(page, vma, compound); } -static void page_remove_file_rmap(struct page *page, bool compound) +/** + * page_remove_rmap - take down pte mapping from a page + * @page: page to remove mapping from + * @vma: the vm area from which the mapping is removed + * @compound: uncharge the page as compound or small page + * + * The caller needs to hold the pte lock. + */ +void page_remove_rmap(struct page *page, + struct vm_area_struct *vma, bool compound) { - int nr = 0; + struct compound_mapcounts mapcounts; + int nr = 0, nr_pmdmapped = 0; + bool last; VM_BUG_ON_PAGE(compound && !PageHead(page), page); - /* Hugepages are not counted in NR_FILE_MAPPED for now. */ + /* Hugetlb pages are not counted in NR_*MAPPED */ if (unlikely(PageHuge(page))) { /* hugetlb pages are always mapped with pmds */ atomic_dec(compound_mapcount_ptr(page)); return; } + lock_page_memcg(page); + /* page still mapped by someone else? */ if (compound && PageTransHuge(page)) { - int nr_pages; + lock_compound_mapcounts(page, &mapcounts); + mapcounts.compound_mapcount--; + last = !mapcounts.compound_mapcount; + if (last) { + nr = nr_pmdmapped = thp_nr_pages(page); + if (mapcounts.subpages_mapcount) + nr = nr_subpages_unmapped(page, nr_pmdmapped); + } + unlock_compound_mapcounts(page, &mapcounts); - if (!atomic_add_negative(-1, compound_mapcount_ptr(page))) - return; + } else if (PageCompound(page)) { + struct page *head = compound_head(page); - nr = nr_pages = thp_nr_pages(page); - if (head_subpages_mapcount(page)) - nr = nr_subpages_unmapped(page, nr_pages); + lock_compound_mapcounts(head, &mapcounts); + mapcounts.subpages_mapcount--; + last = subpage_mapcount_dec(page); + nr = last && !mapcounts.compound_mapcount; + unlock_compound_mapcounts(head, &mapcounts); - if (PageSwapBacked(page)) - __mod_lruvec_page_state(page, NR_SHMEM_PMDMAPPED, - -nr_pages); - else - __mod_lruvec_page_state(page, NR_FILE_PMDMAPPED, - -nr_pages); } else { - bool pmd_mapped = false; - - if (PageTransCompound(page)) { - struct page *head = compound_head(page); - - atomic_dec(subpages_mapcount_ptr(head)); - pmd_mapped = head_compound_mapcount(head); - } - if (atomic_add_negative(-1, &page->_mapcount) && !pmd_mapped) - nr++; + last = atomic_add_negative(-1, &page->_mapcount); + nr = last; } - if (nr) - __mod_lruvec_page_state(page, NR_FILE_MAPPED, -nr); -} - -static void page_remove_anon_compound_rmap(struct page *page) -{ - int nr, nr_pages; - - if (!atomic_add_negative(-1, compound_mapcount_ptr(page))) - return; - - /* Hugepages are not counted in NR_ANON_PAGES for now. */ - if (unlikely(PageHuge(page))) - return; - - if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) - return; - - nr = nr_pages = thp_nr_pages(page); - __mod_lruvec_page_state(page, NR_ANON_THPS, -nr); - - if (head_subpages_mapcount(page)) { - nr = nr_subpages_unmapped(page, nr_pages); - + if (nr_pmdmapped) { + __mod_lruvec_page_state(page, PageAnon(page) ? NR_ANON_THPS : + (PageSwapBacked(page) ? NR_SHMEM_PMDMAPPED : + NR_FILE_PMDMAPPED), -nr_pmdmapped); + } + if (nr) { + __mod_lruvec_page_state(page, PageAnon(page) ? NR_ANON_MAPPED : + NR_FILE_MAPPED, -nr); /* - * Queue the page for deferred split if at least one small + * Queue anon THP for deferred split if at least one small * page of the compound page is unmapped, but at least one * small page is still mapped. */ - if (nr && nr < nr_pages) - deferred_split_huge_page(page); - } - - if (nr) - __mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr); -} - -/** - * page_remove_rmap - take down pte mapping from a page - * @page: page to remove mapping from - * @vma: the vm area from which the mapping is removed - * @compound: uncharge the page as compound or small page - * - * The caller needs to hold the pte lock. - */ -void page_remove_rmap(struct page *page, - struct vm_area_struct *vma, bool compound) -{ - bool pmd_mapped = false; - - lock_page_memcg(page); - - if (!PageAnon(page)) { - page_remove_file_rmap(page, compound); - goto out; + if (PageTransCompound(page) && PageAnon(page)) + if (!compound || nr < nr_pmdmapped) + deferred_split_huge_page(compound_head(page)); } - if (compound) { - page_remove_anon_compound_rmap(page); - goto out; - } - - if (PageTransCompound(page)) { - struct page *head = compound_head(page); - - atomic_dec(subpages_mapcount_ptr(head)); - pmd_mapped = head_compound_mapcount(head); - } - - /* page still mapped by someone else? */ - if (!atomic_add_negative(-1, &page->_mapcount) || pmd_mapped) - goto out; - - __dec_lruvec_page_state(page, NR_ANON_MAPPED); - - if (PageTransCompound(page)) - deferred_split_huge_page(compound_head(page)); - /* - * It would be tidy to reset the PageAnon mapping here, + * It would be tidy to reset PageAnon mapping when fully unmapped, * but that might overwrite a racing page_add_anon_rmap * which increments mapcount after us but sets mapping - * before us: so leave the reset to free_unref_page, + * before us: so leave the reset to free_pages_prepare, * and remember that it's only reliable while mapped. - * Leaving it set also helps swapoff to reinstate ptes - * faster for those pages still in swapcache. */ -out: + unlock_page_memcg(page); munlock_vma_page(page, vma, compound); |