Show HN: Autotab – An AI-powered Chrome extension to create Selenium scripts

hugs · on Oct 19, 2023

Selenium project creator here...

Very cool! I totally expect testing to be one of the killer apps for AI. And this is an easy way to get there.

How do you ensure the AI isn't hallucinating any details in the generated code?

jonasnelle · on Oct 19, 2023

That's awesome! We couldn't have built any of this without Selenium, definitely standing on the shoulders of giants :)

Right now we have the basic sanity check that the Selenium xpath selects exactly one element in the DOM. Going forward I think the best way to do this is to have a live preview where you can see the mirrored browser copy your last action and debug if it errors or isn't what you wanted.

eastendguy · on Oct 25, 2023

> and debug if it errors or isn't what you wanted.

That is a potential problem Because at this point the user will still need good HTML/Selenium/coding knowledge to debug. We ran into the same issue using chatgpt generated scripts for ui.vision. Once a QA person has to figure out why some generated code does not work, it becomes a hassle and removes any initial advantage over the classic record & replay approach.

jonasnelle · on Oct 25, 2023

I agree, to really debug they need coding knowledge today.

But there's something in between where you can say "try again" or even give high level feedback like "don't click the element with that exact title, just click whichever one is first in the list". My working hypothesis is that these are bigger than the ones where you need coding knowledge. Certainly this category will only grow as language models become better.

eddtries · on Oct 20, 2023

Can you have a feedback loop, for example trying the test in the background (headless) and if it doesn't work altering paths and retrying?

jonasnelle · on Oct 20, 2023

Yes! Two things we’re working on that would be very cool are 1) seeing the mirrored actions executed live as you record and 2) even if the website changes later, using AI at runtime to auto-heal the automation.

eddtries · on Oct 20, 2023

Cool! Good luck, it would help me with my work so I’m rooting for you :)

jondwillis · on Oct 19, 2023

This is amazing. I will try to have it automate my system of agents web app in a meaningful way (turtles all the way down.) Shameless plug: https://github.com/agi-merge/waggle-dance

BTW I don’t normally use Chrome- I know why you would start with that, but it would be great to see a Firefox or even Safari version of this.

jdthedisciple · on Oct 20, 2023

I checked out waggledance.ai - looks awesome what it can do and really interesting to me!

My question would be what does waggledance do that a GPT4 reply would not provide besides the (really well done) UI?

In my experience the way the tasks are generated and broken down is pretty much what I get when I ask ChatGPT.

jonasnelle · on Oct 19, 2023

Glad to hear it, let me know how it goes! And yes, other browsers are definitely in our future at some point

jondwillis · on Oct 19, 2023

I got stuck with an `autotab record` error after following the README setup instructions:

selenium.common.exceptions.WebDriverException: Message: Service /opt/homebrew/bin/chromedriver unexpectedly exited. Status code was: -9

jonasnelle · on Oct 19, 2023

Hm, that unfortunately sounds like a chromium/Chromedriver error which can be somewhat finicky. I recommend check that the versions you're using match (`chromedriver --version` and at chrome://settings/help). Let me know how it goes and happy to help debug further (though GPT4 is probably more helpful than I will be)

justusthane · on Oct 20, 2023

Wow! I haven't played with it a much yet, but just wanted to say that waggledance.ai seems really, really cool. It seems like a really interesting idea, and I love the UI and general aesthetic!

jondwillis · on Oct 20, 2023

Thank you! It’s been fun to build and is getting even more fun now that I am adding skills and inter-agent interactions.

pedrolins · on Oct 19, 2023

This is amazing!! I've found myself writing selenium scripts to automate tasks for my dad's job (things such as getting a name from a spreadsheet, putting that name in a website's search box and from there repeating the same actions for 100s of names) and saved him a ton of time. Making browser automation more accessible by just showing the machine how to do it will definitely make lots of people's lives easier. Can't wait to mess around with it.

alexirobbins · on Oct 19, 2023

Interesting, sounds like the kind of stuff we're excited about helping with! Would love to hear what you think once you try it, email is in my bio if you're open to a chat.

keepamovin · on Oct 20, 2023

This is very cool! I want to host this on a server and deliver it using BrowserBox so you can build automations on mobile.

This was what I originally created BrowserBox for--a delivery platform for web scraping authoring tools so you're not limited by extensions--but got into building the remote browser as a product, and haven't got around to the automation authoring tool yet, haha!

Thank you for creating this, and making it MIT. AI-augmented human guided web scraping authoring is definitely the future in this space I think!

Great name, too!

autotab

Very cool!

I was already calling my up-and-coming BrowserBox SaaS Cloudtabs so there's .... "synergies" hahaha! :)

https://github.com/BrowserBox/BrowserBox

jonasnelle · on Oct 20, 2023

Thanks! Excited to see what you do with it

BrowserBox looks cool, I couldn’t tell from the README what browser engine it uses. Did you build your own?

keepamovin · on Oct 20, 2023

Thank you, Jonas!

No, building that wouldn't give us anything. We use Chrome. Also works with anything else derived from Chromium, so that's Edge, Brave, etc.

Chrome runs headless on the server, instrumented with Chrome DevTools Protocol^0. The cool thing about using the DevTools protocol is you can customize so much. CDP is basically like a superset of the extensions APIs. In our case, one thing it's useful for is fine control over live-streaming the tabs.

You must be super busy with everything you're doing, if you ever have any questions about anything feel free to launch an email to me at cris@dosyago.com -- I wish you the best with autotab!

0: https://chromedevtools.github.io/devtools-protocol/

lucgagan · on Oct 20, 2023

I am building _exactly_ the same thing for Playwright over at https://ray.run/. I think this is the future of writing tests no doubt. Planning to launch next week.

By the looks of it, I am taking a slightly different approach than you. I am using LLM only to identify elements, but the actual generators are created using https://ray.run/browser-extension

wackget · on Oct 20, 2023

FYI your animated background should be disabled when your visitor [prefers-reduced-motion](https://developer.mozilla.org/en-US/docs/Web/CSS/@media/pref...).

Edit: or just disabled in general cos it's super annoying

brianjking · on Oct 24, 2023

Ooh, ty! Going to ping you about this.

typosaur · on Oct 19, 2023

How is this different from the "Recorder" feature available in Chrome (Dev Tools > Recorder)?

It can record a user-journey and print out a puppeteer script.

jonasnelle · on Oct 19, 2023

The main difference is probably that autotab outputs a Python script, which is easier to integrate with existing Python codebases.

Going forward I think the flexibility of Python code & the Python ecosystem will make it easier to build a bunch more things we want to do. For example, a lot of browser-based work is not totally deterministic but rather requires more intelligence during the automation, e.g. summarizing information on a web page or dynamically filling in a text box with the right answer.

typosaur · on Oct 19, 2023

Wow! I can totally see that now. Basically you want to build a more intelligent browser-automation tool :)

Havoc · on Oct 19, 2023

Wow - did not know about that feature. Thanks for highlighting it!

Freddie111 · on Oct 19, 2023

I was curious about the YC backing and realised that you're the founders of https://www.ztool.co. Did you pivot or is this a complementary prodcut to ztool? Would love to hear more about your journey :)

jonasnelle · on Oct 19, 2023

Hey, nice detective work :)

This is definitely more of a pivot, though the ZTool product also went through a few bigger iterations itself. I think my learnings from building and talking to users the last few months boil down to two main things that led me to work on autotab.

    1. AI-generated software isn't ready for non-technical users. With ZTool I was initially focused on making it easy for people who don't know how to code to create automations. For the reasons we talked about above, I think having the model's output be code is best for now, so autotab is focused on users that can review and tweak Python code.
    2. How you communicate intent matters. There is a surprising amount of mental work required to go from "this task is annoying" to creating a structured representation of it (c.f. the whole world of process automation). AI demos have focused on short prompts to communicate intent, but those don't work that well for more complex domains. Using the browser to communicate intent seems really powerful because it's so intuitive/familiar.

btbuildem · on Oct 20, 2023

> This will launch a Chrome session controlled by Selenium and then log you in to Google

Why is logging into Google a requirement?

jonasnelle · on Oct 25, 2023

Fyi we fixed this so now you can get an API key at autotab.com/dashboard and then that is sufficient to authenticate when you run `autotab record`, no more logging in to Google in the Selenium browser!

jonasnelle · on Oct 20, 2023

We currently only allow log in with Google for logging in to autotab itself, but will add login with username/password and other options soon! Depending on feedback we might also build a bring your own Open AI API key option

btbuildem · on Oct 20, 2023

Ah that makes sense. I missed the API auth part, assumed BYO(OAAK) was the default

eshack94 · on Oct 20, 2023

Sorry if this is self-explanatory to most, but what does `BYO(OAAK)` mean? Thanks!

vincengomes · on Oct 21, 2023

Im also seeing it for first time.

But i assume it means Bring Your Own (Open AI Authentication Key).

eshack94 · on Oct 21, 2023

That makes sense, thanks! I assumed BYO was for "Bring your own" but wasn't sure about the 2nd part.

gHA5 · on Oct 19, 2023

Does this allow me to record something like a workflow or macro I can use as a shortcut in an open browser window or is it restricted an automated process of opening a browser window and executing some actions?

Specifically, assume I have a bunch of open tabs, each with an identical element, say the Full Screen button on Youtube video pages. Can I record "Exit Full Screen, move to next tab, enter Full Screen of the video on the page" with Autotab?

alexirobbins · on Oct 19, 2023

You could do a version of this if you always open your Youtube videos in the browser instance your autotab script spins up, but short answer is not really. Macros like this are super interesting though, and we've talked a lot about ways browsers could be designed for AI and automation like this.

jonasnelle · on Oct 19, 2023

This is very interesting, may I ask what macros you would define? Definitely something I'd like to learn more about

gHA5 · on Oct 19, 2023

The example I've given is really the only situation so far where I desired such functionality. I know very little about web stuff and I don't know if there is a term for what I'm looking for, so I can't give you any pointers. Macros and workflows are just similar concepts I know.

omarfarooq · on Oct 20, 2023

What would make this maximally useful is if an LLM with a ruleset would be available to parse and act on dynamic situations on pages.

For example, add to cart, if the LLM detect it's out of stock (a rule), then do some other action.

What's the possibility of LLM-reinforced, rules based branching logic like this being possible with your software in the future?

jonasnelle · on Oct 20, 2023

Couldn’t agree more! Working on this actively. Interesting questions include 1) what format is most effective for the user to convey their intent (maybe not point & click) and 2) how to represent the model output such that it is auditable and editable.

omarfarooq · on Oct 20, 2023

Awesome to hear, it's already useful now for us to get started with using it, but this kind of evolution would be panacea.

I suggest a Discord group for Autotab to start building up a community! Looking forward.

alexirobbins · on Oct 24, 2023

We just setup a discord! https://discord.gg/seGGxSUgzM

mkonecny · on Oct 19, 2023

Curious which part of this uses AI. Doesnt it just track mouse / keyboard events across the session?

monkpit · on Oct 19, 2023

AI looks to be involved in creating element selectors.

alexirobbins · on Oct 19, 2023

Selectors have been our primary focus so far – they're notoriously finicky! Our roadmap includes more extensive use of AI, both as embedded intelligence, and in the code generation process. For example, one thing we've heard from heavy users of browser automation is that maintenance becomes the largest cost. Self-healing automations will be able to either fix themselves, our give you an alert with a suggested fix to work off of.

jamesmcintyre · on Oct 19, 2023

The "self-healing" sounds very interesting. I've tried to think, myself, how to approach this in a chrome extension running dom selectors in automations. Curious if you have any high-level thoughts/findings in this area?

alexirobbins · on Oct 19, 2023

We're just getting started on it ourselves but it's a really fun problem. I think the useful thing from our findings so far is that simplifying the DOM representation really helps the model reason about state.

invertednz · on Oct 19, 2023

I'm confused the demo shows typing in to select a element in a row which looks to be AI, I don't see anything that looks to be AI in the selectors? I'm not even sure how you would work with selectors unless you put the whole html into the context window or just ask which locator looks most reliable?

jonasnelle · on Oct 19, 2023

That’s exactly what we do - we sample relevant parts of the DOM and use the model to write the logic for selecting that element. This works pretty well and saves a lot of time that developers otherwise spend inspecting the html structure to write the selectors themselves.

Going forward we’re excited to experiment with more intelligence at runtime e.g. using AI to try to recover if the selector wasn’t found.

monkpit · on Oct 19, 2023

So I assume the video is the ground-truth, then the AI has access to the DOM and the video, and generates a selector based on the video during the test run (each time) in order to do avoid flakiness due to DOM/class/attribute changes?

jonasnelle · on Oct 19, 2023

Right now the generated script is the ground truth but we’ve been working on augmenting this with images & videos to fall back on. We think defaulting to code is good because it is faster, cheaper and more easy to reason about in the 95%+ of times it works. Plain old Selenium will get you pretty far, especially if creating scripts is much easier.

matt3D · on Oct 19, 2023

How comfortable should I feel about having this record the input of potentially sensitive information if it's going to send those inputs off to an LLM?

One way I can see around it is to not put anything sensitive in during the record, but sometimes I might need to enter a password etc.

alexirobbins · on Oct 19, 2023

You can pause recording at anytime. For passwords we recommend using the login tools in our starter repo, check out the example config here: https://github.com/Planetary-Computers/autotab-starter/blob/...

For sensitive information that appears in the DOM during recording, there is a chance it could be included in a prompt to the LLM. We're using OpenaAI via API, which is SOC 2 & 3 compliant and does not use data for model training (supposedly) https://trust.openai.com

seabass · on Oct 19, 2023

This is a great use of the sidepanel API. Really like the idea and how you implemented it. The Firefox equivalent to that API isn’t identical to Chrome’s, and I believe that there is currently no way to trigger the side panel to open programmatically unless it’s in response to a user action. So it may take a bit of work, but would be exciting to see support for FF in the future.

A bit tangential but I was curious what you used to record the demo video on your landing page with those zoom-in animations during critical moments. I’d like to record something like that for some of my side projects and thought your video looked rather polished.

jonasnelle · on Oct 19, 2023

Thanks! It's still very much early days for the Sidepanel and the API feels very much in flux but an exciting new form factor for browser-integrated experiences. Makes total sense, supporting other browsers is likely something we will do in the future

I used https://www.screen.studio/ for the demo and past demos, it works quite well I find though it can get a bit too excited on the zoom ins and I have to cut out some of them

loeing · on Oct 19, 2023

I'm working on something kind of similar but for Appium. This is excellent work!

hugs · on Oct 20, 2023

Tell me more! Have a link?

atum47 · on Oct 19, 2023

Forgive my ignorance, but isn't the chrome extension market kind of dead?

I had some silly extension a long time ago but after receiving a lot of emails from Google saying that they were killing the ecosystem I've dropped them.

jonasnelle · on Oct 19, 2023

autotab is not an always on browser extension like you are probably used to. autotab is only added to the Selenium-controlled browser window you use to write the automation while you're recording.

Also my experience differs more broadly, I use browser extensions quite a bit, e.g. for ad blocking and password management. Afaik Honey was basically a browser extension and was bought by Paypal for $4B.

atum47 · on Oct 19, 2023

I think this is the email i got that led me to believe extensions were dead:

Now: You can no longer create new paid extensions or in-app items. This began as a temporary restriction in March 2020 due to the COVID-19 situation. This change is now made permanent. December 1, 2020: Free trials are disabled. The "Try Now" button in CWS will no longer be visible, and in-app free trial requests will result in an error. February 1, 2021: Your existing items and in-app purchases can no longer charge money with Chrome Web Store payments. You can still query license information for previously paid purchases and subscriptions. (The licensing API will accurately reflect the status of active subscriptions, but these subscriptions won’t auto-renew.) At some future time: The licensing API will no longer allow you to determine license status for your users.

I have the whole thing in my inbox if anyone is interested

j0hnyl · on Oct 20, 2023

This all just means that you need to roll your own monetization functionality. I think there are still people building businesses on top of browser extensions.

xnx · on Oct 19, 2023

What distinguishes this from UI.Vision's (https://ui.vision/) session recorder?

jonasnelle · on Oct 19, 2023

I'm not familiar with UI.Vision but from looking at their website briefly it looks like the resulting automations they are part of their system/UI. We think that output as code is the way to go, so that you don't have to learn a new UI/language, aren't locked in and can integrate it into a larger project.

HanClinto · on Oct 19, 2023

Thanks for sharing this! I'm interested in seeing what you've built and understanding it better. I think that your goals and philosophical guideposts make a lot of sense. I see the search application on your homepage, but I'd like to see other examples to spark imagination for how I might practically use this myself.

-However, I'm getting a 404 on your Github link -- is that `autotab-starter` repo private?- (n/m -- looks like it's working now!)

alexirobbins · on Oct 19, 2023

fixed! thanks for the heads up

taneq · on Oct 20, 2023

Amusingly I've just started playing with llama.cpp, and the first successful test (after some issues with AVX instructions on my test VM) was getting Vicuna 30b to complete "The result of your test is":

> "The result of your test is: You should not use Selenium WebDriver for testing web applications. Other automated testing tools such as Cucumber, RSpec or Jest may be more suitable for your needs. [end of text]

Seemed a little mean. :P

jimmySixDOF · on Oct 23, 2023

My understanding is some of the LLM tooling is heading in this direction like Langchain and Open Interpreter are working on a screenshot to agent action function capability. Hot space !!

mthoms · on Oct 20, 2023

Hopefully you're still monitoring this page:

Do the HTML contents of the page get sent to your server for processing? What guarantees are there over privacy?

One of the uses I see for this is in helping users scrape their own transactions from banking and other financial websites.

jonasnelle · on Oct 20, 2023

Yes, we process the contents of the page when users are recording on our servers but don’t store the page HTML. I don’t see any world in which we would sell data, but in case you’re worried about the security of our servers I’m not sure there’s much I can say other than that we use standard security best practices including SSL, 2FA on cloud accounts etc. Does that answer your question?

mthoms · on Oct 21, 2023

It does, thanks.

kazinator · on Oct 20, 2023

Sorry, name taken: https://www.kylheku.com/cgit/c-snippets/tree/autotab.c

:)

kamray23 · on Oct 20, 2023

All names are taken, you just have to become more popular to win it.

eshack94 · on Oct 20, 2023

Unless it's copyrighted/trademarked, in which case you have legal ground to stand on (assuming both are proprietary services, which may not apply here).

clay09 · on Oct 21, 2023

Nice work! What are some differences between this project and Katalon Recorder? Katalon Recorder is also a chrome extension, allows recording of actions, and export to Python.

jonasnelle · on Oct 21, 2023

Katalon looks like a cool project, hadn’t heard of them before! We’re more focused on automations as opposed to testing, which is most important for where we take the product next. More focus for us on handling tasks that are less deterministic/require a bit more intelligence.

dbmikus · on Oct 20, 2023

Congrats on the launch! This definitely seems useful for some stuff I've been doing, like filling out a CRM spreadsheet with the right user data.

jonasnelle · on Oct 20, 2023

Thanks so much! Glad to hear it

dr_kiszonka · on Oct 19, 2023

Hey, it looks very neat. I was wondering if Autotab currently supports logging into websites that require 2FA. (Your readme mentions Google only.)

jonasnelle · on Oct 19, 2023

Hey, we're looking to build out our library of authentication plugins. Many websites offer "Sign in with Google" which makes things easier, you can just specify that in the autotab.yaml file. If you have a specific website in mind, I'd be happy to add support for it!

dr_kiszonka · on Oct 20, 2023

Thanks for the reply! I was thinking about an internal website. I don't know if this is a reasonable suggestion, but maybe you could focus on common SSOs and 2FA providers like Duo and OneLogin?

jonasnelle · on Oct 20, 2023

Interesting! I think the way we’ll handle this is the sane way we handle Google 2FA atm: We have you log in and do 2FA manually once, then we save the logged out cookies.

Why the logged out cookies? That way even if someone gets the cookies they still need the password, but the automation can log in with just your password, no 2FA needed.

derac · on Oct 19, 2023

I'd love if it could output Playwright code as well. I haven't tried it but if this works well it's a great concept!

jonasnelle · on Oct 19, 2023

Good to know! The comments have definitely made us update towards looking into Playwright again quite seriously

lobochrome · on Oct 20, 2023

Yeah - I moved my entire automation library over at the beginning of the year. The docker integrations and ease of switching out browsers without having to manage webdrivers individually sealed the deal. I'm also tentatively using their selectors a bit, although find I have to retreat to Xpath quite a bit. Playwright seems to be the obvious choice for testing - but automation who knows.

If you can be playwright for automation instead of testing - that'd be a biiiig market. Lot's and lot's of scrapers out there and it's a market microsoft is not interested in.

jonasnelle · on Oct 20, 2023

Interesting, can you say more about why you prefer Playwright’s selectors? Helpful to have some anecdotes from real experience!

lobochrome · on Oct 25, 2023

I don’t prefer I’m just playing. If they work, the readability of the code is just simpler vs xpath.

But it’s less than 5% of selectors across the code base

eddtries · on Oct 20, 2023

Playwright is definitely the industry standard testing tool for those who are likely to be interested in Autotab too - I'm a test engineer and the last time I used Selenium was probably 2017. Playwright is everywhere that's comfortable using AI-tooling!

jonasnelle · on Oct 20, 2023

Interesting! We are more focused on automations as opposed to testing - would you say the same applies there as well?

Also Selenium does have 10x the downloads of Playwright on PyPI - is Python different or do you think that metric is misleading?

eddtries · on Oct 20, 2023

Selenium IS the more popular library, but it’s mostly used in more old-school places like banks and large corporations. This is fairly anecdotal but companies up for AI-guided paths in prod will more likely be on Cypress or Playwright, and from that Playwright fits in for your case because it’s also webdriver driven. The package actually includes a webdriver btw, so you don’t have to ask users to manage that themselves (for example brew chromedriver has to have permission changes to run if you follow your readme, which playwright would avoid).

Edit: as they both use the same interface it might not be too bad to support both?

shantnutiwari · on Oct 20, 2023

How is this different from Playwright? There, You can record your steps and get a python script?

eshack94 · on Oct 20, 2023

This is super cool!

What are some use cases that you recommend this for, outside of testing browser workflows?

jonasnelle · on Oct 21, 2023

We’re actually mainly focused on automations more than testing. We want to build autotab into your intern/assistant for browser-based tasks that you would rather have someone else do.

For example, autotab helped a designer automate client handoffs. He set up an automation that exported assets from Figma and uploaded them to Google Drive, updated his CRM and notified the client.

revolt2tech · on Oct 20, 2023

Replace the GitHub repo with an executable Electron app, beat the world.

jonasnelle · on Oct 20, 2023

How’d you guess my secret plan? ;) Definitely in the cards going forward

tr33house · on Oct 19, 2023

Awesome. I'm mostly curious what the path to monetization for this is

jonasnelle · on Oct 19, 2023

I'm more interested in making it useful than extracting value.

Longer term I think you'd monetize the infrastructure to run the automations at scale in the cloud.

pkaye · on Oct 19, 2023

What AI does this use?

jonasnelle · on Oct 19, 2023

We use OpenAI's GPT4 to understand the position of the element being interacted with relative to the structure of the website overall. Lots of cool things we want to try out with open source models, but for now GPT4 is good enough.

nomilk · on Oct 20, 2023

Could this work for other languages, for example, R?

jonasnelle · on Oct 20, 2023

Unfortunately we have no plans to support R at the moment, though in my experience GPT4 is pretty good at translating code between languages

boromi · on Oct 20, 2023

macOS only? Wanted to try this on Windows

Tokumei-no-hito · on Oct 19, 2023

why does it require logging into google to use?

alexirobbins · on Oct 19, 2023

We plan to allow users to login with other accounts / signup with email + pw, just haven't set that up yet :-)

mherrmann · on Oct 19, 2023

I don't get why people still use XPaths, CSS selectors or HTML IDs to identify elements, even when they are "recorded". Please please please just use my https://github.com/mherrmann/selenium-python-helium instead. It makes so much more sense.

antman · on Oct 19, 2023

Let me have the opportunity to really thank you for this it has been my goto library for scraping, while allowing also selenium syntax, the idea behind it is really great. Also the “elements below …” very nice

Please make the switch to selenium 4, so that it is kept up to date with the rest of the selenium ecosystem e.g. undetectable browsers

shmoogy · on Oct 19, 2023

I haven't found anything that made me really need to try other things, but the fact that it's just a wrapper and sounds like it handles nested iframes better means I'm trying this next time I need to make a quick selenium script. Thank you

monkpit · on Oct 19, 2023

This is the idea behind Testing-Library from what I understand.

https://testing-library.com/

potamic · on Oct 19, 2023

Why wouldn't you use CSS selectors? They are unambiguous, concise and most importantly, a standard. They should be the conventional way to refer to elements.

bdcravens · on Oct 19, 2023

The CSS selector isn't the problem, it's how you identify it. Many sites have dynamically named selectors, including class names. Even depending on ordering is fraught. Learning how to create robust selectors is about 1/2 the battle in writing a good Selenium/Puppeteer/Playwright/etc script. (been doing this as a major part of my day job for about 14 years)

mherrmann · on Oct 19, 2023

The CSS structure of a web site is an implementation detail and on a global average, CSS selectors break much more often than user-visible labels.

monkpit · on Oct 19, 2023

> and most importantly, a standard.

A standard for what, exactly? Selecting HTML elements? You could say the same about XPath.

Both are implementation details tightly coupled to the code instead of being driven by the “human” elements of the UI, such as the visible text, labels, etc.

alexirobbins · on Oct 19, 2023

Helium seems great! Handling popups sounds really clever, how do you pull that off?

abdusco · on Oct 19, 2023

Playwright is another, really good alternative

bdcravens · on Oct 19, 2023

It is, but the selector issues are the same. (though it does make it much easier to execute scripts in the context of the page when standard selectors don't work)

tmerse · on Oct 19, 2023

It is, but the selector issues are the same

I disagree. Playwright's locators are pretty powerful compared to standard css/xpath selectors.

For example it includes layout based selectors.

https://playwright.dev/docs/other-locators#css-matching-elem...

sigg3 · on Oct 19, 2023

This looks great, thanks

pid-1 · on Oct 19, 2023

Related: https://playwright.dev/ has a built-in code generator. In *my opinion*, it's also more pleasant to work with than Selenium.

jonasnelle · on Oct 19, 2023

Interesting, what features are most impactful in making it more pleasant to work with?

We picked Selenium because it seems much more widely used at least in the Python world (selenium has about 10x more downloads on PyPI than playwright). My guess is that's at least partly because Playwright is more focused on the JS/TS ecosystem and testing.

LordKeren · on Oct 19, 2023

I recently finished moving fr Python-selenium to Python-playwright. Though my outcomes might be different, it is such a significant improvement that I would strongly, strongly recommend at least spending a couple days trying it out if you are familiar with selenium.

To my team, selenium’s only advantage in the python ecosystem is that it is easier to hire people with experience. However, anyone familiar with selenium is likely to pick up playwright extremely quickly anyways

Playwright does ship a pytest integration but it is not required

Some highlights:

1. Better waiting — everything is auto-waited and auto-retried by default

2. Easy to install browsers — no need to get separate webdriver browsers. Just run “playwright install chromium”

3. Full, accurate typing

4. Trace viewing to step through a script execution

5. Async support

6. (Arguably) more pythonic syntax and easier to pick up/ train people

I really enjoyed mastering selenium over the years, but I struggle to think of a use case for it outside of maintaining legacy script suites anymore. Playwright just does it better.

jonasnelle · on Oct 25, 2023

Have been digging into Playwright a bit - can you say more about the trace viewing to step through a script execution? Very curious!

jonasnelle · on Oct 20, 2023

That’s super helpful, thanks for explaining!

jasonjmcghee · on Oct 20, 2023

Wildly better developer experience. Playwright is also much less flakey and easier to work with than cypress, another alternative to selenium.

Selenium is quite outdated, though still widely used. It's much older. I would look at downloads over time / rate of change, not total downloads.

My hot take- don't hook yourselves to outdated tech.

Don't take my word for it- go try out selenium, Cypress, and playwright and draw your own conclusions.

hugs · on Oct 20, 2023

fwiw, the Selenium project is now old enough to go to college (19 years this month!); it's not ready to retire just yet. (Side-note: Google Chrome is 15 years old. Age is just a number.) Selenium is still learning from other projects and implementing new things. Specifically, check out the WebDriver BiDi project, adding a bidirectional protocol which was the core of what made Playwright and Puppeteer faster. Also, Selenium devs are working with the W3C to make this work for everyone.

https://github.com/w3c/webdriver-bidi

https://w3c.github.io/webdriver-bidi/

Google wrote a good explainer about WebDriver BiDi here:

https://developer.chrome.com/articles/webdriver-bidi/

johnsutor · on Oct 19, 2023

Similarly, https://github.com/AndrewUsher/playwright-chrome-recorder

bdcravens · on Oct 19, 2023

True, but like the Chrome extension in the sibling comment, it often hard-codes references, which may not be the same in the next run.

on Oct 19, 2023

[deleted]