I need to extract a part of string that may appear 1 to n times in each line.
For instance, this would reflect what I need:
This [dbo].[something] is a text containing [dbo].[something_else], then okay? And then, [dbo].[something] may appear just once. But why, nothing prevents [dbo].[something] from appearing twice as [dbo].[something] here. And then can be three times, as [dbo].[something] is [dbo].[anything] but [dbo].[elsewhere] here. [dbo].[otherthing] depicts another scenario with just one and pattern heading line Or, also [dbo].[ultra] with an arbitrary amount of [dbo].[references] but ending with [dbo].[pattern]
As you may have noticed, the pattern would be
\[dbo\]\.\[[^]]+\]. For instance, from the text above, I would want a result of:
something something_else something something something something anything elsewhere otherthing ultra references pattern
Then I can just inline everything (or append to a bash array) and filter duplicates, this shouldn’t be an issue. I am just having trouble to figure out how to do this filter in a single sweep.
What I have here, results in extracting just the last match (it is obvious why when you are used to sed’s “greedy” approach to pattern matching):
cat dborefs.txt | sed -E "s/(.*\[dbo\]\.\[([^]]+)\].*)*/\2/g" something_else something something elsewhere otherthing pattern
I could extract, then replace the patterns so that they no longer match, then extract again until I get no more matches, but that sounds just too cumbersome, all bash overhead considered; it would be best to be able to extract everything in a single call to
sed. I feel this should be possible, just can’t easily figure out how. Thinking this may be useful for others, I felt like sharing the matter here could prove fruitful for the community.