h1. Issue Details
h2. Summary
An edit of Geo data triggers infinite loop and high open files which cause epic crash in backstage clusters (NE, APRSAPI) that crashes various products dependent on it (primarily Hotel, Hotel+Flight, Experience), and to lesser extent Train, Flight, Connectivity. On sept 17 ~11PM all problematic geo data have been corrected but geo cache located in aprsg-memcached that's currently used by aprsapi-app still contains wrong data as it store up to 6 hours old of data. Hence, flushing the memcache is the only reasonable action to be able to make all clusters that process cached geo data to be available.

h2. Chronology
Timezone: GMT+7
h3. 2017-09-17

08:30 -- all systems are up & products are healty. Non-Indonesia geos are disabled, All NE instances were used.
13:00 -- neA that contains various fix from BEI was being used in production, neB was prepared as hot backup
13:08 -- Hotel is down, aprsapi lbint contained no healthy instances.
13:47 -- Hotel product recovered.
17:00 -- All geos enablement started
17:01 -- Release a fix into aprsapi
17:28 -- All geos were enabled
17:38 -- Hotel, experience products were down. All geos but ID were disabled.
17:58 -- all aprsapi were restarted
16:10 -- hotel apis were openned, using neB. Hotel detail page is not working. aprsapi was down one by one
16:12 -- hotel is down, apis were routed off
16:20 -- aprsapi was rolled-back to 3w-ago release version
16:27 -- hotel was back online.
19:52 -- hotel product was down, all aprsapi instances were tagged unhealthy. NE was fine.
20:06 -- suspect of bad geo data was discovered, found 2 problematic geoids
20:46 -- geo data was corrected in NE's DB, neA was restarted
21:18 -- hotel was back online.
22:30 -- prevention of infinite loop in aprsapi was released, hotel product stable
22:49 -- discovered another source of infinite loop + aprsapi was using cached geo data, 2 problematic geo data was still cached.
23:28 -- memcache flush request was created
23:45 -- flush was done. Hotel product has been stable since.

Incident ticket