Private Server (host it yourself) and default mode is "capture", not just bookmarks

Hi, funny I’m in here making product suggestions. I’m not a regular community forum poster, and I haven’t even put this product through its paces. But I’ll start using it in the next few days.

But I am excited to talk about my “dream” product:

  • Client-side (eg, browser plugin / http proxy) for capturing and searching / viewing on macos / linux desktop / android. (at least, those are my priorities). Sounds like memex is well on it’s way with this part.
  • 100% effective HTML replay. There’s open source out there that’s “pretty good” already
  • Everything is captured. If I see it, it’s in my index. It’s 2021, I don’t want to even click on “bookmark this” and figure out how to organize it etc. Perhaps a blacklist for a handful of things, but… the more I think about it, I have a hard time coming up with anything that absolutely shouldn’t be in the index. DRM would be a pain, but… that could be tuned over time I guess. Just skip content that is protected at the widevine level, having a youtube link instead of the video is better than nothing. Also since the content is immutable, something that was captured might not initially be viewable / indexable but could become viewable after an update comes out to the scraper component.
  • Search experience is important, weighting recent data etc. At least, not the default lucene experience :slight_smile:
  • The above parts are free and open source.

Here’s what I would pay for, and could be customer-only source:

  • Well-tuned server side setup delivered as CDK or terraform or whatever, to fire up a private server, with proper key management and cert rotation. Choice of cloud provider would be nice, but optimizing on one cloud probably more important, since I have to pay the cloud bill as well. Just a few knobs for like how to age things out of the index, things that are cost related, the rest would be settings with the “app” or browser plugin or whatever it is… Personally I could deal with a cloudformation zip, but most customers probably can’t or wouldn’t want to… so maybe it has to be one of those cross-account setups where I give memex the IAM perms and DNS access. Memex boots everything and keeps it running, but you can’t get at my KMS keys that encrypt the index. :slight_smile: When I think about it, everyone who has an amazon account also has an AWS account with a credit card setup. Also, 10 years ago, my mom setup a backup service for her Mac which involved creating an s3 bucket and issuing keys. And she’s not particularly “technical”. So the private cloud hosting part of this may not be that “far out” there
  • hmm, I think there are papers about search technology where the index and the search term and both encrypted and unknown to the DB, but is it actually usable tech? Maybe instead, focus initially on keeping the content size small enough that you can keep the last 15 days on each client. When you take out video / voice, the amount of data goes down fast right?
  • Repeatable builds that checksum and do some kind of self attestation that the source I see is what is running for every component involved :white_check_mark:. With build logs I could review etc. Per-customer private IOS and android builds that have embedded certs to connect to my server env? I don’t know why that would be better, but it sounds neat.

I think I used to pay $99/year for evernote. I could see paying something in that range. It’s the lets encrypt model… if you strictly only ship after the operations are automated, you can take on a huge number of clients.

Possible to build iteratively and quickly and without taking VC money? That is a tough question…

One last thing, the premise of “the content is strictly private” has an impact on this from top to bottom. Personally, I would be happy with that. But I don’t know how many of me there are. Journalist types, data hoarders… students? Everybody would like having it though. Like, I always intend to download PDF statements for dozens of paperless bank accounts, but never do it. This service would just about automate that problem. That hunky guy who hid his account on dating.com? You’ve got his content to obsess over as long as you like. (Let’s not make this too weird though…)

Want to “share” something? Copy and paste the link or the content. When “share this” was added to evernote was around the time it started feeling bloated.

Thanks for reading, now I have to go actually launch the product that already exists, heh.

Ethan

1 Like

Oh wow, thanks for adding all of this!

We definitely can’t serve all of those needs but maybe I can point you to other products that are more suitable for that

  • Cross device: On our way there. We just had a meeting discussing our next steps in providing a 3+ device sync. You can already sync between one browser and a phone.
  • HTML replay: http://webrecorder.io/
  • Full-Text History index: We had that but had to deprecate it for now. Reasoning here. You may wanna check out historysearch.com.
  • Search experience: Is ranked by title > content > time. Right now not a priority to improve much, but we are raising funds to improve that. Ideas so far: Better content type filters, filters for “is annotated”, ranking by visit duration, fuzzy search, tag AND/OR search. Which one do you feel is most important to the search queries you expect to run daily (or at least multiple times a week)
  • Server side setup: We have plans to make things more self-hostable, but not anytime soon. We first wanna manage to make this a profitable business and solve some of the more important UX challenges.
  • Encrypted searching: Last time I checked searching over encrypted information was not ready for prime time yet.
  • Repeatable builds: Yeah that would be an ideal state to reach, especially for privacy focused people. I don’t think we are anywhere near that though. Good thing is that Firefox is actually checking our builds against the source code before releasing them to the store.

RE PDF statements: We are currently not doing any archiving of pages or PDFs, and not planning to do so anytime soon.

RE building iteratively and quickly: Yeah that is possible but not for features that have a low user demand (like those self-hostability and hard privacy features). Because we don’t take VC money we need to be much more focused on immediately revenue generating activities - so we have the funds later to transform the product to be more privacy focused and self hostable.

Thanks Ethan!
I’d be curious to understand a bit more about the way you intend to use Memex. Are you interested to jump on a 15-30min call with me someday? If so you can pick a time here: calendly.com/worldbrain

Cheers
Oli

Thanks for your reply to my rant. And the product hints. I’ve already discovered Webrecorder.io and I’m using it when I see content that looks like it won’t be around long.

Here’s how I found memex. Randomly I wanted to tweet some complaining about personal search indexes not being a thing. I tweet about once/month. I had found https://historio.us/ in the past, and was searching for it so I could link and say “like this except always capturing”. For whatever reason I couldn’t find it again, (I bet the .us domain is working against them!) but I found memex. And your forum was a much better… forum to get my complaining out.

Re: privacy is a niche market. I look at it is growth opportunity. In the early days of social networking, we all gave our live’s content away for free without thinking about it. Then, everyone realized what happened and got creeped out. But it was too late. Then Cambridge Analytica and a presedential race that was won using facebook profiling… Now, how many keybase account are there? Or protonmail? 100,000x more than memex? (yeah, i know, you’re just starting)…

To use your precious resources to add “social”… Is that really wise? I have a capture to share ratio of around 1000/1. And when I share, I send people the https link, not a getpocket wrapped version of the article.

Sorry, I shouldn’t be taking space by ~criticizing~ questioning your strategy when I barely know the space you’re in. I do appreciate how you have this forum and you’re obviously interested in what customers have to say.

On that note, what is the difference between memex and histori.us?

(PS wow, just read the link, your original design was to record everything! Bravo!! And aww that’s a bummer you had to let it go. Maybe It will come back. :pray:)

E

Sorry, I shouldn’t be taking space by ~criticizing~ questioning your strategy when I barely know the space you’re in. I do appreciate how you have this forum and you’re obviously interested in what customers have to say.

Don’t worry. It’s actually very appreciated hearing your candid and honest opinion on this.

May I ask why you have a capture to share ratio that is so low? What would make you share more? Which frictions are in the way?

We see that the single-user tool still has room to improve on its own - and we are right now raising the money to fix that.

However we see enourmous potential to improve social media and online-research collaboration workflows.
Right now the steps between curation, organisation and resharing is fractured, causing a lot of frictions. Imagine you had something like Twitter with organisation/search/annotation features of Memex, where you can with little friction save things to your knowledge store and reshare them in various contexts, or integrate into other workflows with tools like Roam, Notion, Wordpress or Zapier.

Now, how many keybase account are there? Or protonmail? 100,000x more than memex?

In the end, the needs for privacy is context dependent. Email is highly private - online research not necessarily. Keybase also only has 400k users, so still a niche compared to the market of knowledge management tools like Notion oder Evernote or social media tools like Twitter or Facebook.

So most people in the audience we think Memex is highly useful for rather have a good user experience than high privacy.
Also privacy in collaboration contexts is something that is 100x more tricky than simple messages like an email application.

Just think about all the various permission permutations in sharing lists, annotation and having discussions - doing those with a privacy-first mindset you need the budget of a hyped up crypto project.

I think whats more important than hard core privacy (e.g. full encryption for everything) is data ownership and interoperability. So you should be able to take your data somewhere else easily and have little/no social lock in (e.g. switching between whatsapp/telegram and still talk to friends using other services)

The problem is both privacy and interoperability are incredibly hard/costly to build and are right now both not fully open to our budget. We’re trying the best we can by keeping things offline first and allowing you to access all your data.

And also we need to discern privacy vs data driven business models.
What kind of privacy is really important? That no machine can see it? No person can see it? No data is used without consent? The answers to those questions lead all to very different design decisions and accessibility to users.

This is why we are focusing more on data ownership and interoperability as a principle to guide our development, than hard core privacy. The latter has shown that it creates subpar user experience and is very hard to have a good iteration speed.

We also chose Steward Ownership as an investor model to not get into the pressure to at some point exploit user data for the sake of growth.
So we are probably already by an order of magnitude less incentivised to apply Facebook’s ‘ethical’ practices to user data.

On that note, what is the difference between memex and histori.us ?

Apart from most features being premium, Historious offers some web archiving features. I love that they also offer you to share a searchable store of saved pages. Something I really wanna have for Memex too. Another use case that is impossible to do without a cloud and reduced privacy.

Also Historious has no notes or annotation features, and limited queryability.

Hope that helps :slight_smile: