We need a model of fetching person and company profile data that:
- includes entity key identifiers in found profile response payloads (slug/flagship url, public ID, member ID, and ideally SN ID too)
- includes the last scraped timestamp so we know how old the profile data is
- makes commercial sense for our business model (e.g. is pay per usage)
- is performant (fast to query and receive results)
- is robust (if not relying on real-time web-scraping, less risk of downtime caused by 3rd party changes)
The freshness of data is less important for the above requirements - ideally data would be no older than 30-90 days for active/modified profiles, but it does not need to be real-time up-to-date as we can use the existing real-time API endpoints for real-time profile data when required.
In short, we’d like a way to query the dataset directly, even if it’s updated less frequently, as long as the experience is faster, cheaper and more stable.