106 00:13:46.514 --> 00:13:54.695 Philip Durbin: well, so that's 1 pain point. I can totally understand that being a pain point. And we talked a little bit about the the documentation 107 00:13:55.740 --> 00:13:59.539 Philip Durbin: and anything else you want to mention about, or just 1st impression. 108 00:14:00.297 --> 00:14:03.140 Svetlana Lebedeva: Yeah, I guess, since we are edited. 109 00:14:03.140 --> 00:14:03.570 Philip Durbin: Other things. 110 00:14:03.570 --> 00:14:05.429 Svetlana Lebedeva: A bit of introduction. I think 111 00:14:05.730 --> 00:14:12.169 Svetlana Lebedeva: one of the 1st problems that we are now trying to solve with writing our own external tool is 112 00:14:12.290 --> 00:14:21.849 Svetlana Lebedeva: kind of the absence of possibility of sample level information fill up, let's say. 113 00:14:22.000 --> 00:14:36.679 Svetlana Lebedeva: I mean, compared to, for example, fair dome seek where basically a user biologist in our case can just fill out. Excel sheet with all of the like multiple samples right and upload, and the fair dome will just ingest it and make 114 00:14:36.840 --> 00:14:44.509 Svetlana Lebedeva: fields out of it right? And we did not find this possibility in the like in the front end. 115 00:14:44.710 --> 00:14:50.000 Svetlana Lebedeva: So we're just trying to write our own tool right now, that makes it very, very. 116 00:14:50.180 --> 00:14:55.770 Svetlana Lebedeva: let's say low level for the user to fill it out, because otherwise, like 117 00:14:56.090 --> 00:15:08.510 Svetlana Lebedeva: a workaround with, let's say, compound fields which can be then have a multiple times. So you have like sample one. And then you have a lot of metadata fields for sample one, and then you 118 00:15:08.660 --> 00:15:22.670 Svetlana Lebedeva: click on plus and put more and more. But this is extremely tedious for the user to fill out. If they have, usually they have 20 to 30 samples. You cannot ask them to, like, you know. Enter that all manually. And this is actually what we are right now, working on 119 00:15:22.990 --> 00:15:25.700 Svetlana Lebedeva: just as a maybe more like a comment. 120 00:15:26.880 --> 00:15:29.430 Svetlana Lebedeva: More users have this problem. 121 00:15:29.650 --> 00:15:39.620 Svetlana Lebedeva: But in biology we do have like a lot of metadata which belongs to only one sample. But still we want to combine the samples into the same data set right. 122 00:15:40.140 --> 00:15:42.570 Philip Durbin: Kind of one structural point. 123 00:15:43.510 --> 00:15:49.399 Philip Durbin: You mentioned another another tool that does this already. Would you be able to put a link in the chat to that tool. I'm just curious. 124 00:15:49.400 --> 00:15:51.749 Svetlana Lebedeva: We are developing it ourselves and 125 00:15:52.430 --> 00:15:56.879 Svetlana Lebedeva: not ready. So it's it's being developed right now. So. 126 00:15:56.880 --> 00:16:01.540 Philip Durbin: But I thought you said their tool like I know you're writing your own external tool. But I thought that you it was sort of 127 00:16:01.760 --> 00:16:02.780 Philip Durbin: because. 128 00:16:02.780 --> 00:16:04.130 Svetlana Lebedeva: Could not find any. If we. 129 00:16:04.130 --> 00:16:05.170 Philip Durbin: You couldn't find it. 130 00:16:05.170 --> 00:16:06.220 Svetlana Lebedeva: Taken. 131 00:16:06.608 --> 00:16:13.610 Philip Durbin: And then just so I understand you're saying a sample can have lots and lots of metadata fields. 132 00:16:14.810 --> 00:16:24.340 Philip Durbin: and so is any of that information contained inside a file that the the biologist would upload or. 133 00:16:24.340 --> 00:16:25.170 Svetlana Lebedeva: Oh! 134 00:16:25.170 --> 00:16:27.239 Philip Durbin: No, they have to manually enter it all. 135 00:16:27.240 --> 00:16:33.980 Svetlana Lebedeva: Exactly. That's the thing like our ideas. Now that they will just fill in as simple 136 00:16:34.080 --> 00:16:38.130 Svetlana Lebedeva: excel sheet like table. We just write a small 137 00:16:38.250 --> 00:16:43.690 Svetlana Lebedeva: python app. That will do that right, that they will allow them to do that because 138 00:16:44.545 --> 00:16:49.973 Svetlana Lebedeva: oh, sorry. Now I'm digressing. When was the original question. 139 00:16:50.670 --> 00:16:53.365 Philip Durbin: Well, it just sounds like you have a lot of metadata fields. 140 00:16:53.590 --> 00:16:54.070 Svetlana Lebedeva: Yes. 141 00:16:54.070 --> 00:16:57.395 Philip Durbin: You're trying to simplify the workflow. 142 00:16:57.950 --> 00:17:03.389 Svetlana Lebedeva: It's hard, because also we want to then automatically generate 143 00:17:04.058 --> 00:17:27.890 Svetlana Lebedeva: metadata to be uploaded to common data sharing databases on the web right? And then we are also restricted by the fields that they have, and they demand a lot of fields like, you know, genotype organism like sex. Whatever tissue you can imagine. There are like about 20 fields which can still be. You know, they need to be filled out so. 144 00:17:29.520 --> 00:17:32.424 Philip Durbin: Right? Well, a couple of thoughts. 145 00:17:33.110 --> 00:17:39.200 Philip Durbin: one thing is, I wrote a little script. We have this new repo called dataverse recipes. 146 00:17:41.750 --> 00:17:45.290 Philip Durbin: And what you're talking about reminds me of the one 147 00:17:45.610 --> 00:17:49.610 Philip Durbin: I call it create data sets from Excel, because basically 148 00:17:50.210 --> 00:17:57.400 Philip Durbin: our head curator, Sonya said, I have this excel sheet with like 96. 149 00:17:58.299 --> 00:18:04.929 Philip Durbin: We want to create 96 data sets based on these rows. And so that's what this script does. 150 00:18:05.090 --> 00:18:12.380 Philip Durbin: It just reads in, you know, and of course every excel sheet is going to be different. But possibly you could find this to be helpful. 151 00:18:13.121 --> 00:18:16.140 Philip Durbin: I'll just put a link to it in the in the notes. 152 00:18:17.740 --> 00:18:20.249 Philip Durbin: And then, in terms of 153 00:18:21.140 --> 00:18:26.100 Philip Durbin: you know, the fields. As you may know, dataverse has 154 00:18:26.280 --> 00:18:29.470 Philip Durbin: this concept of custom, metadata blocks. 155 00:18:29.610 --> 00:18:36.041 Philip Durbin: so you could create your own that has all the fields that you need. Are you aware of this? 156 00:18:36.380 --> 00:18:41.020 Svetlana Lebedeva: We are using them, of course, for our own metadata fields. 157 00:18:41.280 --> 00:18:54.169 Svetlana Lebedeva: Problem is still, you cannot like you have data sets. And then we want to have. Let's say, this is a missing level, which is under a data set, right? So we want to have, let's say. 158 00:18:54.570 --> 00:18:58.020 Svetlana Lebedeva: 20 samples there and then they 159 00:18:58.350 --> 00:19:10.740 Svetlana Lebedeva: yeah. The user should be able to like, maybe copy paste. If it's all like human sampling should be able to, you know, type, human and copy, paste it instead of typing it 20 times. 160 00:19:11.090 --> 00:19:16.539 Philip Durbin: Yeah, are you aware of the data set templates feature. 161 00:19:18.660 --> 00:19:24.720 Svetlana Lebedeva: Yes, but it's still restricted. The problem is in the fields themselves, right? 162 00:19:25.200 --> 00:19:26.450 Svetlana Lebedeva: It's kind of 163 00:19:26.780 --> 00:19:45.230 Svetlana Lebedeva: I don't know. I'm not explaining it. Well, it's not so easy to. Yeah, because for you, probably our 20 samples would be 20 data sets right. But we want to keep them all in one data set together because they have been processed together. Let's say right. 164 00:19:45.520 --> 00:19:55.739 Svetlana Lebedeva: that's the thing. It's just more like we're trying to substitute a missing sublevel by this workaround right? 165 00:19:57.550 --> 00:19:58.180 Philip Durbin: Okay? 166 00:19:58.320 --> 00:19:58.995 Philip Durbin: Well, 167 00:20:00.590 --> 00:20:12.480 Philip Durbin: Zuluip has a lot more than just the containers channel. I would say you could put. There's a 1 called troubleshooting, or one called community. If you want to ask these more. 168 00:20:12.650 --> 00:20:17.569 Philip Durbin: you know, non container, just like, how does database work and ideas and things like that 169 00:20:17.740 --> 00:20:21.120 Philip Durbin: you're very welcome to start a topic on on that as well. 170 00:20:24.940 --> 00:20:27.109 Philip Durbin: Oh, and welcome, Makio. Thanks for coming 171 00:20:27.580 --> 00:20:50.820 Philip Durbin: anything else any other I don't know. Initial. One thing I wanted to mention, too, is that we are getting funding right now. Well, it's sort of suspended if I'm being honest because of the current administration. But we're getting funded by the Nih, the National Institutes of Health. And so we have a particular interest in medical and biological 172 00:20:51.170 --> 00:20:54.722 Philip Durbin: samples and data right now. So yeah, 173 00:20:55.460 --> 00:21:11.173 Philip Durbin: so possibly I could get you in touch with people on my team that are, you know, looking for people to talk about about use cases and biology and medicine and all these things, so I don't know. Let let me know if you want me to try to get you in touch. 174 00:21:12.700 --> 00:21:19.639 Svetlana Lebedeva: I think it's always good. If let's say they can give and get feedback from. 175 00:21:19.640 --> 00:21:20.110 Philip Durbin: Yeah, well. 176 00:21:20.110 --> 00:21:22.999 Svetlana Lebedeva: We are struggling with, and other biological 177 00:21:23.380 --> 00:21:29.010 Svetlana Lebedeva: data versus are struggling with. Probably that might actually make a lot of sense for you. 178 00:21:29.600 --> 00:21:32.257 Philip Durbin: Right from from actual scientists doing the work. 179 00:21:33.010 --> 00:21:34.680 Philip Durbin: It's always helpful for us. 180 00:21:36.331 --> 00:21:42.269 Philip Durbin: Cool. Okay? Well, well, anything else you wanna you wanna mention up here. Up front. 181 00:21:42.550 --> 00:21:43.850 Svetlana Lebedeva: I already took. 182 00:21:44.390 --> 00:21:48.060 Philip Durbin: Oh, it's fine, it's fine. No, we I really appreciate the the feedback. It's great.