Based on the investigation by our CSS teams, it was found the workload was hitting memory object contention which is not partitioned in SQL Server 2008 R2 limiting the concurrency and scalability of the application.
While the software may work, but it may not scale and in some occasions, may perform poorly especially if you are hitting memory object scaling issues (CMEMTHREAD) as now you have more resources to run concurrent threads in parallel, but all threads wait on a single memory object for memory which becomes a point of contention and bottleneck on the high-end server.
In our lab, we discovered a session state table and very highly executed stored procedure as good candidates to convert to in-memory table using the approach documented here.
This change further boosted the scale of the application where we achieved sustained consistent response times with higher workload as there was no logging overhead or tempdb allocation overhead.
We ran some application performance and stress tests adding more workload (22-25K Batch Requests/sec) first on SQL 2008/Windows 2008 R2 and later upgraded to SQL 2016/Windows Server 2016 to perform A/B testing keeping the hardware and application unchanged.
Following are the results of the A/B testing of application for the same workload running against SQL 2016 v/s SQL 2008.