Skip to content

Intermittent Gmail Sync Failures #12336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mdvertola opened this issue May 28, 2025 · 10 comments
Closed

Intermittent Gmail Sync Failures #12336

mdvertola opened this issue May 28, 2025 · 10 comments
Assignees

Comments

@mdvertola
Copy link
Contributor

Bug Description

I have a self hosted deployment running v0.54. It currently has about ~10k contacts with around ~8k companies. These were all imported via the gmail sync.

I have about 5 users who have imported their gmail accounts into it. 3 users are reporting a several hour stint of email in a 'syncing' state. During that time we seem to import 300-1k emails/companies but eventually end up failing. But on the other hand we do have some users who were able to successfully import 6-7k contacts/companies without any issue.

My deployment uses aws ECS with one server container, one worker container, aws redis and aws postgres. No services/dbs are resource constrained during these syncs.

I see two error messages in the worker node:

  • (TEMPORARY_ERROR) importing messages for workspace {ID} and account {ID}: Too many concurrent requests for user. for message with externalId: {ID} - undefined
  • ERROR�[39m �[38;5;3m[MessagingMessagesImportService] �[39m�[31mError (undefined) importing messages for workspace {ID} and account {ID}: Invalid email format - undefined

I assume we first hit the too many concurrent requests then that gets followed by an invalid (ie undefined) format since the import got rate limited. I read through the self hosting docs here: https://twenty.com/developers/section/self-hosting/setup Are there any ways (that aren't documented) to adjust/change this rate limit?

If there is not a way to adjust this on the twenty side, I can likely adjust the google settings to open up rate limits. Can someone provide an estimate to the rate at which twenty call the gmail api provider during a sync for a single account? Also if any insight can be provided to that rate on multiple accounts at one time it would be helpful as well.

Generally it may also be worth looking into doing some sort of backoff/better error handing in the case that the rate limit is hit.

@mdvertola
Copy link
Contributor Author

I am still seeing this behavior and only seeing the following message:
MessageImportException [Error]: Invalid email format

is there a chance that this error is users quite literally having some email in which the subject/body/metadata has invalid characters/something which breaks a data model?

Also if there is anything else i can provide here with more detail please let me know.

@mdvertola
Copy link
Contributor Author

I have been able to reproduce this error on the twenty SAAS offering (ie not self hosted). @Bonapara I think this would make more sense to be tagged as bug.

@guillim
Copy link
Contributor

guillim commented Jun 2, 2025

to answer your question about rate limiting: this can happen if you have mulitple apps that try to sync your emails at the same time. For instance if you sync your email from the twenty cloud (https://app.twenty.com/) and at the same time on your local instance (http://localhost:4200/) then you might end up this way. Note that if you try to sync at the same time with other apps using n8n or any other sync, it does not help.

This issue we saw in the past does not really affect us anylonger, but you can have a look at vars in the codebase like MESSAGING_GMAIL_USERS_MESSAGES_GET_BATCH_SIZE to decrease it

@guillim
Copy link
Contributor

guillim commented Jun 2, 2025

I am very interested in the second bug you noticed "MessageImportException [Error]: Invalid email format"
Could you tell me more about the stack trace (the full error logs) so that I can understand where it comes from ? I see MessagingMessagesImportService but maybe there are more clues.

The best would be if you could reproduce the issue and find exactly the email that causes the problem !!!

@mdvertola
Copy link
Contributor Author

mdvertola commented Jun 2, 2025

@guillim So i am still seeing an occasional concurrent rate limits nothing else (in my control) is happening to a users inbox a time of sync. (ie no local deployments, other workflows, etc.) but we can leave that be for now.

Please see my PR: #12383

I believe this issue was a result of addressparser choking on on certain email addresses that may be malformed per their rulesets. When it came across something it considered malformed it threw an exception and killed the MessagingMessagesImportService process. The PR creates a "safeParseAddress" function that will gracefully catch those errors and hopefully not kill off the entire MessagingMessagesImportService process but rather just not import any emails it deems invalid.

I have built the image from source and deployed it to my self hosted twenty and it appears to be working for the first couple thousand emails in my first test inbox but i am still conducting testing on a few different inboxes.

@guillim
Copy link
Contributor

guillim commented Jun 2, 2025

Good PR.

I would like to reproduce locally, so can you give me examples of wrongly typed emails that addressparser throws as errors ?

@mdvertola
Copy link
Contributor Author

mdvertola commented Jun 2, 2025

I have only really found this with real world usage of inbox syncs that i am unable to do locally/unable to provide logs for. But here are some theoretical cases that might be decent to eval against:

# Address Why it’s valid (per RFC 5322) Typical addressparser behaviour
1 "[email protected]"@example.com Quoted local-part may legally contain an @. Splits on the first @, returning { address: '[email protected]', name: '@example.com' } instead of a single mailbox.
2 "john..doe"@example.org Consecutive dots are allowed inside quotes. Sees the double dot and discards the address (returns false / []).
3 postmaster@[IPv6:2001:db8::1] Domain-literals (including IPv6) are expressly permitted. Strips the square brackets, then rejects the bare IPv6 string as an invalid hostname.
4 [email protected] “Bang paths” (!) are still legal in the local-part. Treats ! as an illegal character and drops the whole token.
5 user%[email protected] % is another historic routing character that’s valid in the local-part. Splits at the second @, keeping only user%example.com and losing the real destination domain.
# Address Why it’s valid (RFC 6532 / EAI) Typical addressparser behaviour
6 "测试"@example.com The local-part is a quoted UTF-8 string containing two Han characters. Quoting makes any UTF-8 sequence legal under RFC 6532. Returns an empty list ([]) because non-ASCII bytes in the local-part are rejected.
7 张伟@example.com Plain (unquoted) UTF-8 local-part with two Chinese characters; fully legal for SMTP servers that advertise SMTPUTF8. Drops the address as invalid; addressparser accepts only ASCII dot-atoms unless quoted.
8 用户@例子.公司 Both local-part and domain are Unicode. The domain is valid once converted to Punycode (xn--...). Punycode-encodes the domain but still flags the address invalid because of the non-ASCII local-part.
9 "王..小"@例子.测试 Quoted local-part contains consecutive dots and Han characters—allowed in quotes. Domain is an IDN TLD. Splits on the dots and returns garbage tokens or an empty array.
# Address / Pattern Why it’s valid (per RFC 5322/6532) Typical addressparser behaviour
1 "John Doe" (CEO) <[email protected]>
[email protected] (this is a note)
Comments ((...)) are allowed anywhere outside the core local-part@domain. Either strips the comment but leaves dangling spaces / commas, or rejects the whole token.
2 "Very Long Name"\r\n <[email protected]> Folded white-space (CRLF followed by SP/HTAB) is treated as a single space. New-line breaks its tokenizer → returns only "Very Long Name" without the address, or throws.
3 Friends: Alice <[email protected]>, Bob <[email protected]>, "C, D" <[email protected]>; Group syntax ends with a semicolon; everything inside is a member address. Parses only the first mailbox, then stops at the semicolon.
4 <@relay1,@relay2:[email protected]> Obsolete but still-valid source-route form. Treats the leading @ or commas as fatal → empty result.
5 "O\\\"Reilly, Tim" <[email protected]> Back-slash can escape a quote inside a quoted local-part or display name. Returns the display name literally as O\"Reilly, Tim (keeps the back-slash).
6 =?UTF-8?B?5L2g5aW9?= <[email protected]> Encoded-word syntax lets non-ASCII appear in headers without quotes. Leaves the raw =?UTF-8?...?= token; no automatic decode.
7 🍕@example.com
"🦄.unicode"@example.com
RFC 6532 (EAI) permits UTF-8 in local-parts, quoted or unquoted. Rejects as “non-ASCII in local-part”.
8 user@[IPv6:fe80::1%en0] IPv6 domain-literal with a zone identifier after %. Strips the %en0, then errors on bare fe80::1 as bad host.
9 "ends-with-backslash\\\\"@example.org A trailing back-slash can be escaped by another back-slash. Thinks the final \\ is unescaped → returns [].
10 [email protected]; [email protected] The semicolon is an allowed list separator in many MUAs (e.g. Outlook). Takes [email protected]; verbatim, treating ; as part of the address.

@guillim
Copy link
Contributor

guillim commented Jun 2, 2025

I will take over from now on if you don't mind @mdvertola ! This way it will follow what we already did on microsoft sanitization

@guillim
Copy link
Contributor

guillim commented Jun 3, 2025

FYI : this Will be merged today

Fixed by #12383

@guillim guillim moved this from 🔖 Planned to 👀 In review in 🎯 Roadmap & Sprints Jun 3, 2025
charlesBochet pushed a commit that referenced this issue Jun 3, 2025
I believe that some emails with invalid characters are breaking the sync
process.

this PR attempts to create a "safeParseAddress" function. Hopefully this
will change current behavior of a single email breaking the entire sync
process to the sync process "skipping" an invalid email address and
continuing on.

I opened this because of issues explained in #12336

---------

Co-authored-by: guillim <[email protected]>
abdulrahmancodes pushed a commit to abdulrahmancodes/twenty that referenced this issue Jun 3, 2025
I believe that some emails with invalid characters are breaking the sync
process.

this PR attempts to create a "safeParseAddress" function. Hopefully this
will change current behavior of a single email breaking the entire sync
process to the sync process "skipping" an invalid email address and
continuing on.

I opened this because of issues explained in twentyhq#12336

---------

Co-authored-by: guillim <[email protected]>
@guillim guillim closed this as completed Jun 3, 2025
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in 🎯 Roadmap & Sprints Jun 3, 2025
@guillim
Copy link
Contributor

guillim commented Jun 3, 2025

Thanks for your contribution @mdvertola !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants