DISQUS

Miguel de Icaza's blog: Stream.CopyStream - Miguel de Icaza

  • Derek · 1 year ago
    With extension methods, it's a bit cleaner than it used to be (I call mine CopyTo), but yes I have implemented a variation of this many, many times.
  • Jonathan Pryor · 1 year ago
    Sounds like another addition for rocks-playground!
  • Rasmus · 1 year ago
    I have also implemented that quite a few times and been annoyed by the need to do so.

    But I think I know why it is not part of Stream: There are simply too many decisions to make if such a method must robustly handle all kinds of streams:What buffer size to use; what is a suitable time-out, particularly for network streams; should failed operations be retried, and how many times etc..
  • Jonathan Pryor · 1 year ago
    I don't think it's quite that bad, as sane defaults exist for your concerns.

    Buffer size? A reasonable number, e.g. 4096 or 8192 bytes -- something close to filesystem block size without causing too much trouble for the GC.

    Time outs? Ignore them. If timeouts are needed, then the calling code can set Stream.ReadTimeout and Stream.WriteTimeout appropriately; I don't see why a helper method should care.

    Retries? Retries are evil: http://blogs.msdn.com/oldnewthing/archive/2005/....

    A utility method doesn't need to be the be-all, end-all method to be useful. It just needs to provide a sane solution to a problem that would otherwise be hand-written every time.
  • Seth · 1 year ago
    mmm I have always stopped short of implementing this in a framework bit (rock, part of runtime, whatever), because.... quite frankly I don't see the added value.

    In my intuition it almost always comes down to deficiencies in the desing of the stream libraries/consumers that create the need to 'copy' a stream. Invariably this is more like 'relabeling' just to to the taste of a receiving party. While thinking about this it might simply be a wrong intuition (I can think of simple counterexamples like copying a socket to a file).

    In all non-trivial cases, however, Copy is simply a misnomer, because a fair bit of transformation is usually involved (charsets, endianness, line ends, etc).

    In my experience, C++ <iostream> is about the only library that 'gets it right' (.NET duplicating to much of the pitfalls from Java, thought fortunately less so). In c++ any (compatible, i.e. wellknown conversions or 1:1 binary equivalent) streams can simply be copied by saying smart things like:
    std::cout << fstream1.rdbuf() /* << std::eos */;
    Essentially: it is the separation of buffer and stream that saves the day. (Don't try std::cout << fstream1; unless you are very interested in the (hex) address of fstream1 instance).
  • Jonathan Pryor · 1 year ago
    I think you don't fully understand .NET's Stream concept, as it's closer to C++ than Java.

    Stream is _only_ byte oriented. No encodings, no endianness, no line endings, just raw data. It is thus analogous to the C++ std::streambuf type, if even more primitive (there is no wstreambuf equivalent).

    StreamReader and StreamWriter are responsible for text-oriented manipulation, such as encoding issues, end of line encodings, etc., which is what std::istream and std::ostream deal with in C++ (and more).
  • Martin · 1 year ago
    Java's streams are also byte-oriented and raw.
  • Keith J. Farmer · 1 year ago
    The comments about extension methods are completely on the mark: seperate common operations from the object, when those operations don't have any direct bearing on the state. Creating a chain of adapters (buffer, splitter, copier, etc) would be more useful, but only if they didn't belong directly to stream. In fact, I would be very tempted to put them as extensions to IEnumerable<T>, and have an adapter to transform Stream into IE<T>.
  • Jonathan Pryor · 1 year ago
    I actually experimented with this in rocks-playground, providing a `IEnumerable<byte> ToEnumerable(this Stream)` method. It was cute.

    I removed it.

    The problem is that a Stream-backed IEnumerable<T> IS NOT an IEnumerable<T>, as it tends to break things rather badly. For example, many LINQ methods (and rocks-playgound IEnumerable<T> extension methods) assume that the sequence can be consumed repeatedly.

    A stream-backed IEnumerable<byte> can only be iterated over *once*, reliably. Beyond that, and you start getting into seeking issues, which _really_ start killing the thought (NetworkStreams don't seek), etc.

    It's cute, but likely unworkable in practice (at least in my experience).

    In spite of this, I'm still keeping some "once-only" iterator types (TextReaderRocks.Lines() and TextReaderRocks.Words()), as I think they're useful even if they can only be used once, but it's something you need to be careful about.
  • Keith J. Farmer · 1 year ago
    Actually, it worked out rather well for a serial port interface I was using a couple years ago.

    I'll agree that quite a few methods are predicated on re-enumerability, but I don't think that breaks things, if you're of the mind that IDisposable isn't abused when you use it for scoping Transactions (for example).
  • Alan · 1 year ago
    How could System.IO.Stream ever implement CopyStream? It'd be a nightmare to implement!

    The problem is that the Stream in question could be *anything*, i.e. a NetworkStream. By reading data from this stream (in order to make a copy) you will render the initial stream unusable because you've already read the data out, and you cannot rewind it. What you end up with is a single usable version of the stream instead of two.

    If you want to be able to 'Copy' a stream, you have to do it manually. You need to read all the data from the initial stream as a byte[], then instantiate all the MemoryStreams you want from this byte[].

    Suppose you wanted to implement Stream.CopyStream, you could create a class like

    public class StreamSplitter : Stream
    {
    public StreamSplitter (Stream initialStream)
    {
    // Do stuff
    }

    public Stream GetCopy()
    {
    // Do stuff
    }
    }

    Internally the StreamSplitter will read from the initial stream and store the data in a byte[]. GetCopy will then return a MemoryStream type class based on this byte[]. This class would have the ability to tell the StreamSplitter to get more data from the initial stream and make it available to all the copies which have been created.

    The problem is that any copy can initiate this request for more data at any time. This is a pain in the ass for thread synchronisation with regards to the initial stream. Then you have more threading problems when you try making this new data available to the existing copies. All i can say is that there's no way i'd ever implement something like that ;)
  • migueldeicaza · 1 year ago
    Alan,

    In that case, you would not use CopyStream, it is a perfectly fine restriction to say that the stream is consumed after CopyStream is finished.

    There is nothing preventing a "TeeStream" (like the Unix Tee command) to duplicate the contents as it goes and composing the above described operation using this.

    But many times I do not care about keeping a copy, and I do not care about chunked output, or if its network or not, I just want to move the bytes from the source to the destination. And I have seen various broken versions of this loop, or versions that are too optimistic and have never been properly tested.
  • Alan · 1 year ago
    Well, a TeeStream class wouldn't be that complex. A lot of the complexity that i saw in a class which allowed copying a stream an arbitrary number of times, with each copy having the ability to progress the initial stream would be gone. The implementation would be just read-once, write many. No threadin concerns there.

    Something like:

    public class TeeStream : Stream
    {
    public TeeStream (IEnumerable<Stream> destinationStreams)
    {

    }
    }

    would be fairly easy to implement. Something like that should already be in the BCL I suppose. There's no real reason why not.
  • Steve Bjorg · 1 year ago
    Such an addition would be _wonderful_! Especially if it supported asynchronous copying/splitting. For instance, moving gigs of data from a network stream to another network stream (proxying) and a file stream (caching) simultaneously without blocking threads is a pretty complex operation. I would love to see a community vetted implementation of it.
  • Angel Ochoa · 1 year ago
    I guess the problem is that such implementation ( or even definition as an abstract method ) requeres a lot of understanding about the diferences between two streams. Such a copy would need to be "secured" by some contract that regulates what a CopyStream should mean, and right now I belive that there ara as many CopyStreams as streams types ( buffered , unbuffered , etc. ). The extension methods are probably the best solution.
  • commenter · 1 year ago
    This was a good question.
    It just goes to show that if you ask a question like this you will get lots of people trying to look clever by thinking up reasons why it can't be done.
    I've implemented this function multiple times. If the framework can have File.WriteAllLines then it sure as hell should have a CopyStream method. If people have esoteric situations where a simple implementation wouldn't work.... well then, don't use the simple implementation!

    PS: TextReader should implement a 'Lines' enumerator.
  • Alan · 1 year ago
    Well, if all you're looking for is:

    stream2.Write(stream1), then that's trivial enough to implement. However if you want something more along the lines of giving you three identical copies of the same stream so you can independently read from the copies and progress the initial stream, then it's a lot harder.

    byte[] bytes = GetData();
    MemoryStream a = new MemoryStream(bytes);
    MemoryStream b = new MemoryStream(bytes);
    MemoryStream c = new MemoryStream(bytes);

    If you want a CopyStream that essentially does the above except using a Stream as the base rather than a byte[] as the base, the task is very difficult.
  • Atif Aziz · 1 year ago
    Alan, I don't see why it would this would be difficult to implement CopyStream having Stream as a source and target. Here's my implementation that I use in projects:
    http://gist.github.com/12956

    In general, though, this only useful for low latency and small streams. In environments where scalability is key, copying may be best performed using async I/O (BeginRead-BeginWrite).
  • Alan · 1 year ago
    Ah, see that's where i think where we differ. That's just a simple myStream.Write (otherStream). It's completely different to a TeeStream or SplitStream as i described earlier ;) I did say that a writing the contents of one stream to another was easy, but i didn't think that's what was being asked in the original question. Maybe i'm wrong there.
  • migueldeicaza · 1 year ago
    That is a really good idea.

    Are you aware of "Mono.Rocks"? It is a library of extension methods that some developers have been prototyping to add useful extension methods.
  • commenter · 1 year ago
    Nope, I hadn't heard of that.
    I checked the website and found that they've somehow managed to hack into my computer and steal my Int32.Times extension method, so I'll be contacting my lawyers forthwith.
  • Jonathan Pryor · 1 year ago
    > PS: TextReader should implement a 'Lines' enumerator.

    Indeed it should, which is why Mono.Rocks has a TextReader.Lines() extension method in rocks-playground. :-)
  • commenter · 1 year ago
    Hah. With all these wheels being re-invented everywhere, Mono.Rocks is definitely a good idea.
  • Foxfire · 1 year ago
    Implementing such a method might lead to some confusion. On the other hand lots of people need it, so having it would be nice.
    But please name it appropriately: I think you want a "CopyData" Method (you want to copy the data that is transported by the stream not the stream itself if I understand correctly)
  • commenter · 1 year ago
    I called mine Decant, as in 'decant the data from one stream into another'.