20190903 - Univ Search NER Ingestion - Execution Plan Sync
Design docs: https://docs.google.com/document/d/1Ri3c7SjgWWR9cdWwIXUP5uGg_oy5Y9ew4_XzkHmlIQs/edit?usp=sharing
Attendees: @arinto, @akurniawan, @ismail ner, @Prahlad Ram, @Niraj
Discussion points
Here’s the high level tasks for this ingestion
- Scraper execution (From Elasticsearch cluster to dump file in S3)
- Test in stg/dev using Elasticsearch stg before goes to prod
- Use elasticsearch-dump
- Run in Bastion or Lambda
- Use scroll API → no heavy penalty in performance → batch size: 100 → run the script in low load-time (off-peak hour)
- Ticket to Local Team has been raised: https://29022131.atlassian.net/browse/LSD-44
- Need SRS Devops help to setup the infra (captured in Action Items below
- Cloud Data Transfer (from S3 in AWS to GCP)
- List of required fields
- Is it finalized? Yes, and has been reviewed by Niraj.
- When we want to dump, we can specify which fields, via Elasticsearch dump.
Action items
- SRS Devops (Prahlad) will provide this infra setup by tomorrow for stg test
- Connectivity from Bastion to S3, as well as the S3 bucket.
- STS key
- AWS Role
- NVS (Arinto) will liase with Local Team (Evan) on executing the scraper in stage