106
00:13:46.514 --> 00:13:54.695
Philip Durbin: well, so that's 1 pain point. I can totally understand that being a pain point. And we talked a little bit about the the documentation

107
00:13:55.740 --> 00:13:59.539
Philip Durbin: and anything else you want to mention about, or just 1st impression.

108
00:14:00.297 --> 00:14:03.140
Svetlana Lebedeva: Yeah, I guess, since we are edited.

109
00:14:03.140 --> 00:14:03.570
Philip Durbin: Other things.

110
00:14:03.570 --> 00:14:05.429
Svetlana Lebedeva: A bit of introduction. I think

111
00:14:05.730 --> 00:14:12.169
Svetlana Lebedeva: one of the 1st problems that we are now trying to solve with writing our own external tool is

112
00:14:12.290 --> 00:14:21.849
Svetlana Lebedeva: kind of the absence of possibility of sample level information fill up, let's say.

113
00:14:22.000 --> 00:14:36.679
Svetlana Lebedeva: I mean, compared to, for example, fair dome seek where basically a user biologist in our case can just fill out. Excel sheet with all of the like multiple samples right and upload, and the fair dome will just ingest it and make

114
00:14:36.840 --> 00:14:44.509
Svetlana Lebedeva: fields out of it right? And we did not find this possibility in the like in the front end.

115
00:14:44.710 --> 00:14:50.000
Svetlana Lebedeva: So we're just trying to write our own tool right now, that makes it very, very.

116
00:14:50.180 --> 00:14:55.770
Svetlana Lebedeva: let's say low level for the user to fill it out, because otherwise, like

117
00:14:56.090 --> 00:15:08.510
Svetlana Lebedeva: a workaround with, let's say, compound fields which can be then have a multiple times. So you have like sample one. And then you have a lot of metadata fields for sample one, and then you

118
00:15:08.660 --> 00:15:22.670
Svetlana Lebedeva: click on plus and put more and more. But this is extremely tedious for the user to fill out. If they have, usually they have 20 to 30 samples. You cannot ask them to, like, you know. Enter that all manually. And this is actually what we are right now, working on

119
00:15:22.990 --> 00:15:25.700
Svetlana Lebedeva: just as a maybe more like a comment.

120
00:15:26.880 --> 00:15:29.430
Svetlana Lebedeva: More users have this problem.

121
00:15:29.650 --> 00:15:39.620
Svetlana Lebedeva: But in biology we do have like a lot of metadata which belongs to only one sample. But still we want to combine the samples into the same data set right.

122
00:15:40.140 --> 00:15:42.570
Philip Durbin: Kind of one structural point.

123
00:15:43.510 --> 00:15:49.399
Philip Durbin: You mentioned another another tool that does this already. Would you be able to put a link in the chat to that tool. I'm just curious.

124
00:15:49.400 --> 00:15:51.749
Svetlana Lebedeva: We are developing it ourselves and

125
00:15:52.430 --> 00:15:56.879
Svetlana Lebedeva: not ready. So it's it's being developed right now. So.

126
00:15:56.880 --> 00:16:01.540
Philip Durbin: But I thought you said their tool like I know you're writing your own external tool. But I thought that you it was sort of

127
00:16:01.760 --> 00:16:02.780
Philip Durbin: because.

128
00:16:02.780 --> 00:16:04.130
Svetlana Lebedeva: Could not find any. If we.

129
00:16:04.130 --> 00:16:05.170
Philip Durbin: You couldn't find it.

130
00:16:05.170 --> 00:16:06.220
Svetlana Lebedeva: Taken.

131
00:16:06.608 --> 00:16:13.610
Philip Durbin: And then just so I understand you're saying a sample can have lots and lots of metadata fields.

132
00:16:14.810 --> 00:16:24.340
Philip Durbin: and so is any of that information contained inside a file that the the biologist would upload or.

133
00:16:24.340 --> 00:16:25.170
Svetlana Lebedeva: Oh!

134
00:16:25.170 --> 00:16:27.239
Philip Durbin: No, they have to manually enter it all.

135
00:16:27.240 --> 00:16:33.980
Svetlana Lebedeva: Exactly. That's the thing like our ideas. Now that they will just fill in as simple

136
00:16:34.080 --> 00:16:38.130
Svetlana Lebedeva: excel sheet like table. We just write a small

137
00:16:38.250 --> 00:16:43.690
Svetlana Lebedeva: python app. That will do that right, that they will allow them to do that because

138
00:16:44.545 --> 00:16:49.973
Svetlana Lebedeva: oh, sorry. Now I'm digressing. When was the original question.

139
00:16:50.670 --> 00:16:53.365
Philip Durbin: Well, it just sounds like you have a lot of metadata fields.

140
00:16:53.590 --> 00:16:54.070
Svetlana Lebedeva: Yes.

141
00:16:54.070 --> 00:16:57.395
Philip Durbin: You're trying to simplify the workflow.

142
00:16:57.950 --> 00:17:03.389
Svetlana Lebedeva: It's hard, because also we want to then automatically generate

143
00:17:04.058 --> 00:17:27.890
Svetlana Lebedeva: metadata to be uploaded to common data sharing databases on the web right? And then we are also restricted by the fields that they have, and they demand a lot of fields like, you know, genotype organism like sex. Whatever tissue you can imagine. There are like about 20 fields which can still be. You know, they need to be filled out so.

144
00:17:29.520 --> 00:17:32.424
Philip Durbin: Right? Well, a couple of thoughts.

145
00:17:33.110 --> 00:17:39.200
Philip Durbin: one thing is, I wrote a little script. We have this new repo called dataverse recipes.

146
00:17:41.750 --> 00:17:45.290
Philip Durbin: And what you're talking about reminds me of the one

147
00:17:45.610 --> 00:17:49.610
Philip Durbin: I call it create data sets from Excel, because basically

148
00:17:50.210 --> 00:17:57.400
Philip Durbin: our head curator, Sonya said, I have this excel sheet with like 96.

149
00:17:58.299 --> 00:18:04.929
Philip Durbin: We want to create 96 data sets based on these rows. And so that's what this script does.

150
00:18:05.090 --> 00:18:12.380
Philip Durbin: It just reads in, you know, and of course every excel sheet is going to be different. But possibly you could find this to be helpful.

151
00:18:13.121 --> 00:18:16.140
Philip Durbin: I'll just put a link to it in the in the notes.

152
00:18:17.740 --> 00:18:20.249
Philip Durbin: And then, in terms of

153
00:18:21.140 --> 00:18:26.100
Philip Durbin: you know, the fields. As you may know, dataverse has

154
00:18:26.280 --> 00:18:29.470
Philip Durbin: this concept of custom, metadata blocks.

155
00:18:29.610 --> 00:18:36.041
Philip Durbin: so you could create your own that has all the fields that you need. Are you aware of this?

156
00:18:36.380 --> 00:18:41.020
Svetlana Lebedeva: We are using them, of course, for our own metadata fields.

157
00:18:41.280 --> 00:18:54.169
Svetlana Lebedeva: Problem is still, you cannot like you have data sets. And then we want to have. Let's say, this is a missing level, which is under a data set, right? So we want to have, let's say.

158
00:18:54.570 --> 00:18:58.020
Svetlana Lebedeva: 20 samples there and then they

159
00:18:58.350 --> 00:19:10.740
Svetlana Lebedeva: yeah. The user should be able to like, maybe copy paste. If it's all like human sampling should be able to, you know, type, human and copy, paste it instead of typing it 20 times.

160
00:19:11.090 --> 00:19:16.539
Philip Durbin: Yeah, are you aware of the data set templates feature.

161
00:19:18.660 --> 00:19:24.720
Svetlana Lebedeva: Yes, but it's still restricted. The problem is in the fields themselves, right?

162
00:19:25.200 --> 00:19:26.450
Svetlana Lebedeva: It's kind of

163
00:19:26.780 --> 00:19:45.230
Svetlana Lebedeva: I don't know. I'm not explaining it. Well, it's not so easy to. Yeah, because for you, probably our 20 samples would be 20 data sets right. But we want to keep them all in one data set together because they have been processed together. Let's say right.

164
00:19:45.520 --> 00:19:55.739
Svetlana Lebedeva: that's the thing. It's just more like we're trying to substitute a missing sublevel by this workaround right?

165
00:19:57.550 --> 00:19:58.180
Philip Durbin: Okay?

166
00:19:58.320 --> 00:19:58.995
Philip Durbin: Well,

167
00:20:00.590 --> 00:20:12.480
Philip Durbin: Zuluip has a lot more than just the containers channel. I would say you could put. There's a 1 called troubleshooting, or one called community. If you want to ask these more.

168
00:20:12.650 --> 00:20:17.569
Philip Durbin: you know, non container, just like, how does database work and ideas and things like that

169
00:20:17.740 --> 00:20:21.120
Philip Durbin: you're very welcome to start a topic on on that as well.

170
00:20:24.940 --> 00:20:27.109
Philip Durbin: Oh, and welcome, Makio. Thanks for coming

171
00:20:27.580 --> 00:20:50.820
Philip Durbin: anything else any other I don't know. Initial. One thing I wanted to mention, too, is that we are getting funding right now. Well, it's sort of suspended if I'm being honest because of the current administration. But we're getting funded by the Nih, the National Institutes of Health. And so we have a particular interest in medical and biological

172
00:20:51.170 --> 00:20:54.722
Philip Durbin: samples and data right now. So yeah,

173
00:20:55.460 --> 00:21:11.173
Philip Durbin: so possibly I could get you in touch with people on my team that are, you know, looking for people to talk about about use cases and biology and medicine and all these things, so I don't know. Let let me know if you want me to try to get you in touch.

174
00:21:12.700 --> 00:21:19.639
Svetlana Lebedeva: I think it's always good. If let's say they can give and get feedback from.

175
00:21:19.640 --> 00:21:20.110
Philip Durbin: Yeah, well.

176
00:21:20.110 --> 00:21:22.999
Svetlana Lebedeva: We are struggling with, and other biological

177
00:21:23.380 --> 00:21:29.010
Svetlana Lebedeva: data versus are struggling with. Probably that might actually make a lot of sense for you.

178
00:21:29.600 --> 00:21:32.257
Philip Durbin: Right from from actual scientists doing the work.

179
00:21:33.010 --> 00:21:34.680
Philip Durbin: It's always helpful for us.

180
00:21:36.331 --> 00:21:42.269
Philip Durbin: Cool. Okay? Well, well, anything else you wanna you wanna mention up here. Up front.

181
00:21:42.550 --> 00:21:43.850
Svetlana Lebedeva: I already took.

182
00:21:44.390 --> 00:21:48.060
Philip Durbin: Oh, it's fine, it's fine. No, we I really appreciate the the feedback. It's great.