(Just a feedback for the project's tech writer) I always include a "What problem does this solve?" and screenshots in my docs. I think they help people understand a project better.
Here too, I understood Hop's purpose only after seeing the screenshots on secondary pages like https://hop.apache.org/manual/latest/getting-started/hop-gui.... Abstract statements like "aims to facilitate all aspects of data and metadata orchestration" in the front page, or even in the "What is Hop?" doc, didn't help.
I too am getting tired of this shite. This disease afflicts corporate product descriptions to the point I just use wikipedia to get information on WTF £££expensive product does, and it's now metastasizing to free software.
"Apache Hop, short for Hop Orchestration Platform, is a data orchestration and data engineering platform that aims to facillitate all aspects of data and metadata orchestration. Hop lets you focus on the problem you’re trying to solve without technology getting in the way"
What is 'data orchestration'? Ditto 'data engineering platform'?
'facillitate all aspects of data and metadata orchestration' What the hell does this even mean.
'Hop lets you focus on the problem you’re trying to solve' so what problem do you think I'm trying to solve?
It's just so bizarre, it's like language meaning has separated from language itself like layers of plywood left in the rain. And there is no wikipedia page to help out.
> Apache Hop, short for Hop Orchestration Platform, is a data orchestration and data engineering platform that aims to facillitate all aspects of data and metadata orchestration. Hop lets you focus on the problem you’re trying to solve without technology getting in the way. Simple tasks should be easy, complex tasks need to be possible.
> Hop allows data professionals to work visually, using metadata to describe how data should be processed. Visual design enables data developers to focus on what they want to do instead of how that task needs to be done. This focus on the task at hand lets Hop developers be more productive than they would be when writing code."
I'm misunderstanding how so many Apache hosted projects P let someone focus on X without Y getting in the way, totally ignoring the complexity of introducing P and altering everything to align with P, thereby forbidding focus on X.
Thanks for expanding that, it reads like it's some Airflow competitor. Would be curious how it handles all the authentication management for the various pipeline elements.
I know there's a degree of oversimplification going on here, but there's something to be said for having a simple bullet-list breakdown of all the use-cases - alongside the best tool for each use-case.
It is servers as a practical starting point in terms of narrowing down the list of tools (of which there are so many), before one proceeds with a deeper dive into the best fitting tool.
Would be great if there were a site that did this sort of thing for all the common architectural needs.
+1 to AWS Step Functions, in my last three companies I have built fairly complicated workflows with them and once you get used to them they are very powerful, reliable and cheap.
I just wish a little bit more monitoring on top of them but nothing you can not build by yourself.
Ooh Drools plugins, for rule based event-processing. Neat. Hope I can find some examples!
I havent used Airflow, but my impression is this fits a similar role. That itcs built atop good tech like Apache Beam & can use things like Flink is, in my book, a nice win.
Workflow applications gets reinvented all the time (there are hundreds of them out there plus many “standards”). However I have never seen a successful usage of workflow applications in industry. So my question is if anybody has any good examples of workflow applications being more successful than “normal” business applications?
Built an automated credit decision support system for a major financial services institution that used workflows. We needed masses of legacy data systems integration, rule based descisions and human task coordination.
This system was built with Microsoft's Biztalk around 2006. It performed very well in production, but Biztalk had quite a few gotchas that needed to be creatively worked around in development.
How does this compare with Apache DolphinScheduler? That seems to fit the orchistration / workflow scheduling role pretty well, and seems like the next iteration of Airflow... Not quite getting how this compares, and not finding anyone directly comparing them on Google (though that's been less reliable lately).
Java, like .NET, are just solid application platforms which are statically typed and their performance is good.
Java has a history in big systems for soon 30 years.
Rust, Python and Go are just not there yet. Rust is too low level, Python is not statically typed and will always suffer performance wise and Go ... I
is a youngster :). And .NET is always not everyone's free choice.
And Apache, well they just liked Java for their applications. They started with some C/C++ code but then quickly aggregated a lot of Java tech.
Performance, portability, stability, scalability, concurrency, ecosystem (libraries, etc) .... despite all the new languages around, there actually still aren't many alternatives that give you the same combination of all these to the same level as Java does.
Something I've been asking for a long time as well. Java/JVM are great, but it would be great to see _some_ diversity in the Big Data ecosystem when it comes to implementations. :)
You have to deal with the confusion and complexity that is “java dependencies”
Everything is OOP-abstraction-heavy API’s?
I’ve spent too much of my time and effort recently debugging Scala/Spark/JVM resource issues/dependency issues and the more I have to deal with it, the less I want anything to do with JVM-based solutions. The closest I want to get to another awful JVM application is a docker container.
I will rejoice the day that spark-alternatives progress enough for our team to replace our workloads and I can throw our spark stuff into the literal bin.
Yeah but you are basically asking for a world which has fragmented implementations such that some stuff will interoperate and be usable within Spark et al and "the new stuff" which could either all be on the same platform or much more likely spread across a number of platforms.
Once you have gotten the hang of running Spark (and working knowledge of JVM itself) then you have paid the cost and it's done. Adding a whole bunch of new stuff just means more costs to pay, less integration, more fragmentation of knowledge.
If we were to stabilise on a framework, I don’t think Java/spark is the right one. It’s notoriously fickle fragile to run, it’s obscenely resource heavy, often not actually that fast, and it’s construction is locked into a set of implementations that have aged poorly-you basically need a JVM wizard to tune everything correctly for you, and you need to run so much supporting infrastructure that I think it greatly erodes the benefits of spark.
> "the new stuff" which could either all be on the same platform or much more likely spread across a number of platforms.
Yes. I’d like to see more variety in approaches, more variety in specialisations, and for interop between “platforms” to occur at a slightly “higher” level (I.e. maybe common sql dialect, common set of messaging/interop protocols).
I think rather than pouring all of our effort into a singular platform, we should put our effort towards improving our tooling and languages so that it is more straightforward and viable to build these platforms.
Everyone language has got their own web-server framework(s), I see Spark et al as the web-server framework of the data space.
From like 2000 to 2010 or even 2015 either java or .NET was the default choice for big enterprise companies. Nobody ever got fired for picking Microsoft or Java (I would add), as they say. A lot of these Apache projects have been donated from work at big enterprises so it comes out of that background from enterprise I imagine.
Even now, what options are a huge step up from Java/.NET for your standard tech business backend/webbapp? They integrate well with so many other languages, and both have great cross-platform stories...
Outside of the SV bubble actually none. When you look from a conservative angle (static typing, developer availability, productivity, tool support, library support, performance, etc) it goes quickly down to Java and .NET. The dynamic interpretation of Python and JavaScript are their core deal breakers.
Java classloading is a natural fit for calling into a typesafe plugin fetched from a URL. Spark can even serialize lambdas and distribute their execution.
It seems to be more of a hassle to do this kind of thing over IPC with native binaries, and if ARM starts displacing a lot of x86-64 in datacenters it gets more complicated.
Here too, I understood Hop's purpose only after seeing the screenshots on secondary pages like https://hop.apache.org/manual/latest/getting-started/hop-gui.... Abstract statements like "aims to facilitate all aspects of data and metadata orchestration" in the front page, or even in the "What is Hop?" doc, didn't help.