Commit 8082b4e
committed
fix(datalake): address Copilot + gitar-bot findings on Iceberg ingestion
- Fix _is_json_lines false-positive: minified single-line Iceberg/Delta metadata
dicts were classified as JSONL, bypassing the raw_data gate entirely. Now all
three detection conditions (format-version, schema.fields, \) are checked.
- Move _ICEBERG_METADATA_RE and _update_iceberg_entry to DatalakeBaseClient to
eliminate regex/classify duplication between GCS and S3 clients (DRY)
- Replace single-pass O(N) memory approach with two-pass streaming: pass 1 builds
iceberg_tables dict only (O(tables)), pass 2 streams regular files without
accumulation (O(1) per object)
- Fix sys.modules stub in test_iceberg_discovery.py: use setdefault for all three
google module entries to avoid overwriting real installed packages1 parent 3604a30 commit 8082b4e
5 files changed
Lines changed: 77 additions & 68 deletions
File tree
- ingestion
- src/metadata
- ingestion/source/database/datalake/clients
- readers/dataframe
- tests/unit/source/database
Lines changed: 24 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | | - | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
23 | 46 | | |
24 | 47 | | |
25 | 48 | | |
| |||
Lines changed: 13 additions & 28 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
18 | 17 | | |
19 | 18 | | |
20 | 19 | | |
| |||
108 | 107 | | |
109 | 108 | | |
110 | 109 | | |
111 | | - | |
112 | | - | |
113 | 110 | | |
114 | 111 | | |
115 | 112 | | |
116 | 113 | | |
117 | 114 | | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | 115 | | |
134 | 116 | | |
135 | 117 | | |
136 | 118 | | |
137 | 119 | | |
138 | 120 | | |
139 | 121 | | |
140 | | - | |
| 122 | + | |
141 | 123 | | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
146 | 127 | | |
147 | 128 | | |
148 | 129 | | |
149 | | - | |
150 | 130 | | |
151 | 131 | | |
152 | 132 | | |
| |||
155 | 135 | | |
156 | 136 | | |
157 | 137 | | |
158 | | - | |
| 138 | + | |
159 | 139 | | |
160 | 140 | | |
161 | 141 | | |
162 | 142 | | |
163 | | - | |
164 | | - | |
165 | | - | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
166 | 151 | | |
167 | 152 | | |
168 | 153 | | |
| |||
Lines changed: 16 additions & 30 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
17 | 16 | | |
18 | | - | |
| 17 | + | |
19 | 18 | | |
20 | 19 | | |
21 | 20 | | |
| |||
62 | 61 | | |
63 | 62 | | |
64 | 63 | | |
65 | | - | |
66 | | - | |
67 | 64 | | |
68 | 65 | | |
69 | 66 | | |
| |||
73 | 70 | | |
74 | 71 | | |
75 | 72 | | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | 73 | | |
93 | 74 | | |
94 | 75 | | |
95 | 76 | | |
96 | 77 | | |
97 | 78 | | |
98 | 79 | | |
99 | | - | |
| 80 | + | |
100 | 81 | | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
105 | 85 | | |
106 | 86 | | |
107 | 87 | | |
108 | 88 | | |
109 | 89 | | |
110 | 90 | | |
111 | | - | |
112 | 91 | | |
113 | 92 | | |
114 | 93 | | |
| |||
120 | 99 | | |
121 | 100 | | |
122 | 101 | | |
123 | | - | |
| 102 | + | |
124 | 103 | | |
125 | 104 | | |
126 | 105 | | |
127 | 106 | | |
128 | | - | |
129 | | - | |
130 | | - | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
131 | 117 | | |
132 | 118 | | |
133 | 119 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
156 | | - | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
157 | 167 | | |
158 | 168 | | |
159 | 169 | | |
| |||
Lines changed: 13 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
28 | 33 | | |
29 | 34 | | |
30 | 35 | | |
| |||
0 commit comments